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APPEAL BRIEF 

^ g Appellants hereby submit an original and two copies of this Appeal Brief to the Board of Patent 
A^als and Interferences ("the Board") in response to the Final Office Action mailed on 
January 16,2003. TheNoticeof Appeal was timely submitted on April 16, 2003, and was received in 
the Patent and Trademark Office ("the Office") on April 23, 2003. This Appeal Brief is thus timely 
submitted. The Commissioner is hereby authorized tocharge the fee for filing this Appeal Brief ($160.00), 
as required under 37 C.F.R. § 1. 17(c), to Lexicon Genetics Incorporated Deposit Account No. 50-0892. 

Appellants believe no fees in addition to the fee for filing the Appeal Brief are due in connection 
with this Appeal Brief. However, should any additional fees under 37 C.F.R. § § 1. 1 6 to 1 .2 1 be required 
for any reason related to this communication, the Commissioner is authorized to charge any underpayment 
or credit any overpayment to Lexicon Genetics Incorporated Deposit Account No. 50-0892. 

L REAL PARTY IN INTEREST 

The real party in interest is the Assignee, Lexicon Genetics Incorporated, 8800 Technology Forest 
Place, The Woodlands, Texas, 7738 1. 



II. RELATED APPEALS AND INTERFERENCES 

Appellants know of no related appeals or interferences that will directly affect or be directly 
affected by or have a bearing on the Board's decision in the pending appeal. 

III. STATUS OF THE CLAIMS 

The present application was filed on July 5, 2001, claiming the benefit of U.S. Provisional 
Application Number 60/217,600, which was filed on July 1 1 , 2000, and included original claims 1-5. 
A Restriction and Election Requirement was issued on May 22, 2002, separating the original claims into 
three separate and distinct inventions. In a response to the Restriction and Election Requirement submitted 
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to the Office on June 24, 2002, Appellants elected without traverse the claim of the Group m invention 
(original claim 5) for prosecution on the merits, amended claim 5 to further improve its clarity, cancelled 
claims 1-4 without prejudice and without disclaimer as drawn to non-elected inventions, and added new 
claims 6 and?. 

A First Official Action on the merits ("the First Action") was issued on July 30, 2002, in which 
claims 5-7 were rejected under 35 U.S.C. § 101 as allegedly lacking a patentable utility, claims 5-7 were 
rejected under 35 U.S.C. § 1 12, first paragraph, as allegedly unusable by the skilled artisan due to the 
alleged lack of patentable utility, and claim 5 was rejected under 35 U.S.C. § 1 12, second paragraph, as 
allegedly indefinite. In a response to the First Official Action submitted to the Office on December 2, 2002 
("Response to the First Action"). Appellants amended claim 5 to even further improve its clarity, and 
addressed the rejections of claims 5-7. 

A Second andFinal Official Action ("the Final Action") was mailed on January 16, 2003, indicating 
that the rejection of claim 5 under 35 U.S.C. § 1 12, second paragraph, as allegedly indefinite was 
overcome by Appellants remarks and amendment, but maintaining the rejection of claims 5-7 under 
35 U.S.C. § 101 as allegedly lacking a patentable utility and under 35 U.S.C. § 1 12, first paragraph as 
allegedly unusable by the skilled artisan due to the alleged lack of patentable utility. In a response to the 
Second and Final Office Action submitted on March 17, 2003 ("Response to the Final Action"), 
Appellants again addressed the rejections of claims 5-7. An Advisory Action ("the Advisory Action") was 
mailed on April 23, 2003, maintaining the rejection of claims 5-7 under 35 U.S.C. § 101 as allegedly 
lackingapatentable utility and under35U.S.C.§ 112, firstparagraphasallegedlyunusableby the skilled 

artisan due to the alleged lack of patentable utility. Therefore, claims 5-7 are the subject of this appeal. 
A copy of the appealed claims are included below in the Appendix (Section IX). 

IV. STATUS OF THE AMENDMENTS 

As no amendments subsequent to the Final Action have been filed. Appellants believe that no 
outstanding amendments exist. 



V. SUMMARY OF THE INVENTION 

The present invention relates to Appellants' discovery and identification of novel human 
polynucleotide sequences that encode a novel ligand binding protein (specification at page 2, lines 5-7). 

The presendy claimed polynucleotide sequences were compiled from clustered human gene trapped 
sequences, genomic sequences, ESTs, and cDNAs generated firom human testis, skeletal muscle, mammary 
gland and lymph node mRNAs (specification at page 17, lines 4-6). 

The specification details a number of uses for the presently claimed polynucleotide sequences, 
including in assessing gene expression patterns, particularly using a high throughput "chip" format (see, for 
example, the specification at page 6, lines 15-17), and in mapping the sequences to a specific region of a 
human chromosome (see, for example, the specification at page 3, line 8). 

VL ISSUES ON APPEAL 

1. Do claims 5-7 lack a patentable utiUty? 

2. Are claims 5-7 unusable by a skilled artisan due to a lack of patentable utility? 

m GROUPING OF THE CLAIMS 

For the purposes of the outstanding rejections under 35 U.S.C. § 101 and 35 U.S.C.§ 112, first 
paragraph, the claims will stand or fall together. 

VIIL ARGUMENT 

A. Do Claims 5-7 Lack a Patentable Utility? 

The Final Action first rejects claims 5-7 under 35 U.S.C. § 10 1 , as allegedly lacking a patentable 
utility due to not being supported by either a specific and substantial or a well-established utility. 

Appellants point out thai a sequence that has over 99% identity at the protein level with the claimed 
sequence is present in the leading scientific repository for biological sequence data (GenBank), and has 
been annotated by third party scientists wholly unaffiliated with Appellants as a "G-protein coupled 
receptor" (GenBank accession number AX647175, corresponding to European Patent Application 
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Number EP 1 270724; alignment and front page from EP 1270724 shown in Exhibit A). The legal test 
for utility simply involves an assessment of whether those skilled in the art would find any of the utilities 
described for the invention to be credible or believable. Given this GenBank annotation, there can be no 
question that those skilledintheartwouldclearly believe that Appellants' sequence isaG-protein coupled 

receptor (GPCR). 

The Final Action and the Advisory Action question prediction of protein function based upon 
protein homology. In support of this allegation, articles by Bork and Koonin (1998, Nature Genetics 
75:313-318; "BorkandKoonin"),Jiefa/.(1998,J.Biol.Chem.275:17299-17302;"Ji")andYangra/. 

(2000, Science 290:523-527; "Yan") are cited. Appellants will first set forth the shortcomings of these 
particular articles, and then point out the failure of these articles to support the alleged lack of utility of the 
presently claimed sequence. 

First, with regard to the Bork and Koonin article, Bork and Koonin themselves conclude "(i)n 
summary, the currently available methods for sequence analysis are sophisticated, and while further 
improvements will certainly ensue, they are already capable of extracting subtle but functionally relevant 
signals from protein sequences (Bork and Koonin, page 317). Thus, the Bork arid Koonin article is hardly 
indicative of a high level of uncertainty in assigning function based on sequence, and thus does not support 
the alleged lack of utility. 

With regard to Ji, an exact quote from Ji completely undermines the question of asserted utility 
based upon protein homology: "a substantial degree of amino acid homology is found between members 
of aparticular subfamily, but rnmparisnns between subfamilies show significandy less or no similarity" (Ji 
at 17299, first paragraph, emphasis added). This quote suggests that homology with members of a 
G-protein coupled receptor is indicative that the particular sequence is in fact a member of that subfamily - 
the fact that there is little orno homology between subfamilies i srnmpletelv irrelevant. Thus. Ji does not 

support the alleged lack of utility. 

Furthermore, regarding Yan, this paper cites only one example, two isoforms of the anhidrotic 
ectodennal dysplasia (EDA) gene, where atwo amino acid change conforms one isofonn (EDA-Al) into 
the second isofonn (EDA-A2). However, while it is tnie that this amino acid change results in binding to 



-4- 




different receptors, it is important to note that the different receptors bound by the two isoforms are in fact 
related (Yan at page 523). Furthermore, the EDA- A2 receptor was correctly identified as a member of 
the tumor necrosis factor receptor superfamily based solely on sequence similarity (Yan at page 523). 
Thus, Yan does not suggest a high level of uncertainty in assigning function based on sequence, and thus 
also does not support the alleged lack of utility. 

Furthermore, with regard to the citation of journal articles to support an allegation of a lack of utility, 
the PTO has repeatedly attempted to deny the utility of nucleic acid sequences based on a small number 
of spurious publications that call into doubt the usefulness of bioinformatic predictions, of which these three 
articles are merely the latest examples. However, Appellants point out that the lack of 100% unanimous 
agreement on the usefulness of bioinformatic prediction programs is completely irrelevant to the question 
of whether the claimed nucleic acid sequence has a substantial and specific utility. Appellants respectfully 
point out that, as discussed above, the legal test for utility simply involves an assessment of whether those 
skilled in the art would find any of the utilities described for the invention to be believable. Appellants 
submit that the overwhelming majority of those of skill in the relevant art would believe bioinformatic 
prediction to be a powerful and useful tool, as evidenced by hundreds if not thousands of journal articles, 
and would thus believe that Appellants sequence is a GPCR. As believabilitv is the standard for meeting 
the utility requirement of 35 U.S.C. § 101, and not unanimous consent, Appellants submit that the present 
claims must clearly meet the requirements of 35 U.S.C. § 101. 

Rather, with regard to the utility of the presently claimed sequence, as 60% of the pharmaceutical 
products currently being market by the entire industry target G-protein coupled receptors (Gurrath, 2001 , 
Curr. Med. Chem. 5:1605-1648; abstract presented in Exhibit B), a preponderance of the evidence 
clearly weighs in favor of Appellants' assertion that the skilled artisan would readily recognize that the 
presently described sequences have a specific (the claimed GPCR proteins are encoded by a specific locus 
on the human genome, see below), credible, and well-established utility, for example in tracking gene 
expression, as described in the specification as originally filed, at least at page 6, lines 1 5- 1 7. In particular, 
the specification describes how the described sequences can be represented using a gene chip format to 
provide a high throughput analysis of the level of gene expression. Such *DNA chips" clearly have utility, 
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as evidenced by hundreds of issued U.S. Patents, as exemplified by U.S. Patent Nos. 5,445,934 
(Exhibit C), 5,556,752 (Exhibit D), 5,744,305 (Exhibit E), 5,837,832 (Exhibit F), 6,156,501 
(Exhibit G) and6,261,776 (ExhibitH). Evidenceof the "teal world" substantial utility of the present 
invention is further provided by the fact that there is an entire industry estabUshed based on the use of gene 
sequences or fragments thereof in a gene chip format. Perhaps the most notable gene chip company is 
Affymetrix. However, there are many companies that have, at one time or another, concentrated on the 
use of gene sequences or fragments, in gene chip and non-gene chip formats, for example: Gene Logic, 
ABI-Perkin-Elmer, HySeq and Incyte. In addition, one such company (Rosetta Inpharmatics) was viewed 
to have such "real world" value that it was acquired by large a pharmaceutical company (Merck) for 
significant sums of money (net equity value of the transaction was $620 million). The "real world" 
substantial industrial utility of gene sequences or fragments would, therefore, appear to be widespread and 
well established. Clearly, there can be no doubt that the skilled artisan would know how to use the 
presently claimed sequences (see Section Vni(B), below), strongly arguing that the claimed sequences 
have utility. Given the widespread utility of such "gene chip" methods usingpublic domain gene sequence 
information, there can be little doubt that the use of the presently described novel sequences would have 
great utility in such DNA chip applications. As the present sequences are SEecific markers of the human 
genome (see below), and such specific markers are targets for tiie discovery of drugs that are associated 
with human disease, those of skiU in the art would instantiy recognize that the present nucleotide sequences 
would be ideal, novel candidates for assessing gene expression using such DNA chips. Clearly, 
compositions that enhance the utility of such DNA chips, such as the presentiy claimed nucleotide 
sequences, must in themselves be useful. Thus, the present claims clearly meet the requirements of 
35 U.S.C. § 101. 

The Final Action questioned this utility, stating "(s)ince the disclosure does not reveal any 
activity/functions of the nucleotide sequence or the protein encoded by the nucleotide sequence, one skiUed 
in the art would not know how to use the claimed invention" (the Final Action at page 6). However, this 
argument is thwarted by tiie fact that skiUed artisans already have used and continue to use sequences such 
asAppeUantsingenecWpappUcations. AppeUantsrespectfuUypointoutthatthisisexac^ 
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chip applications are carried out. Expression profiling does not require a knowledge of the function of the 

particular nucleic acid on the chip - rather the gene chip indicates which DNA fragments are expressed at 

greater or lesser levels in two or more particular tissue types. Therefore, this argument also fails to support 

the alleged lack of utility of the presently claimed compositions. 

Clearly, persons of skill in the art, as well as venture capitalists and investors, readily recognize the 

utility, both scientific and commercial, of genomic datain general, and specifically human genomic data. 

Billions of dollars have been invested in the human genome project, resulting in useful genomic data (see, 

e.g.. Venter era/. , 2001 , Science 291 : 1304; Exhibit I). The results have been a stunning success as the 

utility of human genomic data has been widely recognized as a great gift to humanity (see, e.g., J asny and 

Kennedy, 2001, Science 2Pi: 1 153; Exhibit J). Clearly, the usefulness of human genomic data, such as 

the presently claimed nucleic acid molecules, is substantial and credible (worthy of bilUons of dollars and 

the creation of numerous companies focused on such infonnation) and well-established (the utility of human 

genomic information has been clearly understood for many years). 

It is important to remember that the requirement for a SECcific utility, which is the proper standard 

under 35 U.S.C. § 101, should not be confused with the requirement for a uniaue utility, which is clearly 

an improper standard. The fact that other expressed sequences can be used to track gene expression on 

aDNA chip does not mean that the use of Appellants' sequence to track gene expression on a gene chip 

is not a specific utility. As clearly stated by the Federal Circuit in Carl Zeiss Stifiung v. Renishaw PLC, 

20 USPQ2d 1 101 (Fed. Cir. 1991): 

An invention need not be the best or only way to accomplish a certain result, and it need 
only be useful to some extent and in certain applications: "[T]he fact that an invention has 
only limited utility and is only operable in certain applications is not grounds for finding a 
lack of utility." Envirotech Corp. v. Al George, Inc, 221 USPQ 473, 480 (Fed. Cir. 
.. 1984) 

If every invention were required to have a unique utility, the Patent and Trademark Office would no longer 
be issuing patents on batteries, automobile tires, golf balls, golf clubs, and treatments for a variety of human 
diseases, just to name a few particular examples, because examples of each of these have already been 
described and patented. However, only the briefest penisal of any issue of the Official Gazette provides 
numenjus examples of patents being granted on each of the above compositions every week. Furthermore, 



if a composition needed to be unique to be patented, the entire class and subclass system would be an 
effort in futility, as the class and subclass system serves solely to group such conunon inventions, which 
would not be required if each invention needed to have a unique utility. Thus, the present sequence clearly 
meets the requirements of 35 U.S.C. § 101. 

Although Appellants need only make one credible assertion of utility to meet the requirements of 
35 U.S.C. § 101 (Raytheon v. Roper, 220 USPQ 592 (Fed. Cir. 1983); In re Gottlieb, 140 USPQ 665 
(CCPA 1964); In re Malachowski, 1 89 USPQ 432 (CCPA 1976); Hoffinan v. Klaus, 9 USPQ2d 1657 
(Bd. Pat. App. & Inter. 1988)), as yet a further example of the utility of the presently claimed 
polynucleotides, as described in the specification at least at page 3, line 8, the present nucleotide sequence 
has a specific utility in mapping the sequences to a specific region of a human chromosome. This is 
evidenced by the fact that SEQ ID N0:1 can be used to map the presently claimed sequence to 
chromosome 1 (present within the chromosome 1 clone, Genbank Accession Number AC 114490; 
alignment and the first page from the Genbank report are presented in Exhibit K). Clearly, the present 
polynucleotide provides exquisite specificity in localizing the specific region of human chromosome 1 that 
contains the gene encoding the given polynucleotide, a utility not shared by virtually any other nucleic acid 
sequences. In fact, it is this specificity that makes this particular sequence so usefiil. Early gene mapping 
techniques relied on methods such as Giemsa staining to identify regions of chromosomes. However, such 
techniques produced genetic maps with a resolution of only 5 to 10 megabases, far too low to be of much 
help in identifying specific genes involved in disease. The skilled artisan readily appreciates the significant 
benefit afforded by markers that map a specific locus of the human genome, such as the present nucleic acid 
sequence. For further evidence in support of the Appellants' position, the Board is requested to review, 
for example, section 3 of Venter a/, (si/praaf pp. 1317-1321, including Fig. 11 atpp.1324-1325; 
Exhibit I), which demonstrates the significance of expressed sequence information in the structural'analysis 
of genomic data. The presenUy claimed polynucleotide sequence defines a biologicaUy validated sequence 
that provides a unique and specific resource for mapping the genome essentially as described in the Venter 
et al. article Thus, the present claims clearly meet the requirements of 35 U.S.C. § 101. 

Appellants respectfully remind the Board that only a mrnor percentage of the genome (2-4%) 



actually encodes exons, which in-tum encode amino acid sequences. The presently claimed polynucleotide 

sequence provides biolo|Edcall v validated empirical data (e.^., showing which sequences are transcribed and 

polyadenylated) that specifically define that portion of the corresponding genomic locus that actually 

encodes exon sequence. Appellants respectfully submit that the practical scientific value of expressed and 

polyadenylated mRNA sequences is readily apparent to those skilled in the relevant biological and 

biochemical arts. Thus, the present claims clearly meet the requirements of 35 U.S.C. § 101. 

It has been clearly established that a statement of utility in a specification must be accepted absent 

reasons why one skilled in the art would have reason to doubt the objective truth of such statement. In re 

Lunger, 503 F.2d 1380, 1391, 183 USPQ 288, 297 (CCPA, 1974; ''Langer'')\ In re Marzocchi, 439 

F.2d 220, 224, 169 USPQ 367, 370 (CCPA, 1971). As clearly set forth in Lunger: 

As a matter of Patent Office practice, a specification which contains a disclosure of utility 
which corresponds in scope to the subject matter sought to be patented must be taken as 
sufficient to satisfy the utility requirement of § 10 1 for the entire claimed subject matter 
unless there is a reason for one skilled in the art to question the objective truth of the 
statement of utility or its scope. 

Lunger at 297, emphasis in original. As set forth in the MPEP, "Office personnel must provide evidence 
sufficient to show that the statement of asserted utility would be considered 'false' by a person of ordinary 
skill in the art" (MPEP, Eighth Edition at 2 100-40, emphasis added). Thus, the present claims clearly meet 
the requirements of 35 U.S.C. § 101. 

Regarding the utility requirements under 35 U.S .C. § 101 , the Federal Circuit has clearly stated 
"(t)hethresholdof utility is not high: An invention is 'useful' under section 101 ifit is capable of providing 
some identifiable benefit." Juicy Whip Inc. v. Orange Bung Inc, 185 F.3d 1364, 51 USPQ2d 1700 
(Fed. Cir. 1999) (citing Brenner v. Munson, 383 U.S. 519, 534 (1966)). Additionally, the Federal Circuit 
has stated that "(t)o violate § 101 the claimed device must be totallv incapable of achieving a useful result." 
Brooktree Corp, v. AdvuncedMicro Devices, Inc.,911 F.2d 1555, 1571, 24 USPQ2d 1401 (Fed. Cir. 
1992), emphasis added. Cross v. lizuka (753 F.2d 1040, 224 USPQ 739 (Fed. Cir. 1985); ''Cross'') 
states "any utility of the claimed compounds is sufficient to satisfy 35 U.S.C. § 101 Cross at 748, 
emphasis added. Indeed, the Federal Circuit recendy emphatically confimied that "anything under the sun 
that is made by man" is patentable (Stute Street Bank & Trust Co. v. Signature Finunciul Group Inc. , 



149 F.3d 1368, 47 USPQ2d 1596, 1600 (Fed. Cir. 1998), citing the U.S. Supreme Court's decision in 
Diamond vs. Chakrabarty, 447 U.S. 303, 206 USPQ 193 (U.S.. 1980)). Thus, based on the relevant 
case law, the present claims clearly meet the requirements of 35 U.S.C. § 101. 

The Final Action also questions the applicability of thiscase law, stating that "the Response cites 

a device case law" and "(t)hus, applicants' argument citing a case law regarding a device is irrelevant to 
the instant case" (the Final Action at pages 4 and 5). Section 101 ofthePatent Act of 1952, 35 U.S.C. 
§ 101, provides that "[w]hoever invents or discovers any new and useful process, machine, manufacture, 
or composition of matter, or anynew and useful improvement thereof," may obtain apatent on the invention 
or discovery. Appellants point out that 35 U.S.C. § 101 covers devices (machine) as well as compositions, 
and makes no distinction between the two with regard to meeting the burden of complying with 
35 U.S.C. § 101. Furthermore, the case law in question (Juicy Whip Inc. v. Orange Bang Inc., 
51 USPQ2d 1700 (Fed. Cir. 1999)) cites Brenner v. Manson, 383 U.S. 519. 534 (1966), which the 
Examiner objdously believes is not "inelevant to the instant case", since the Examiner Imnsdf cites this exact 
case three times in the Final Action (see the Final Action at pages 4, 5 and 6). Additionally, Cra^s and 
Diamondvs. Chakrabarty, supra, do not concern devices, but rather compositions. Thus, this argument 
completely fails to support the alleged lack of utility of the presently claimed compositions. 

Further, in In re Brana (34 USPQ2d 1436 (Fed. Cir. 1995), "Brana"), the Federal Circuit 
admonished thePatentandTiademarkOfficeforconfusing"therequirements under the lawforobtaining 

a patent with the requirements for obtaining government approval to market a particular drug for human 
consumption". Brana at 1442. The Federal Circuit went on to state: 

At issue in this case is an important question of the legal constraints on patent office 
examination piacticeandpoUcy. The question is, with regard to pharmaceutical inventions, 
what must the applicant provide regarding the practical utility or usefulness of the invention 
for whirh pat^n* pr^t^^Unn ic «tnnght This is not anew issue: it is one which we would 
have thought had been settled hv case law vears ago. 

Brana at 1439, emphasis added. The choice of the phrase "utility or usefulness" in the foregoing quotation 
is highly pertinent. The Federal Circuit is evidently using "utility" to refer to rejections under 
35 U.S.C. § 101, and is using "usefulness" to refer to rejections under 35 U.S.C. § 1 12, first paragraph. 
This is made evident in the continuing text in Brana, which explains the correlation between 35 U.S.C. 
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§§ 101 and 112, first paragraph. The Federal Circuit concluded: 

FDA approval, however, is not a prerequisite for finding a compound useful within the 
meaning of the patent laws. Usefulness in patent law, and in particular in the context of 
pt,.^rmirf ;»w.nrinnc nf.ri»Q>;arilvincludestheexpectationoffiirtherresearch and 
development . The stage at which an invention in this field becomes usefiil is well before 
it is ready to be administered to humans. Were we to require Phase H testing in order to 
prove utility, the associated costs would prevent many companies from obtaining patent 
protection on promisingnewinventions.therebyeUminatinganincentive to pursue, through 

research and development, potential cures in many crucial areas such as the treatment of 



cancer. 



Brana at 1442-1443, citations omitted, emphasis added. Even if, arguendo, further research might be 
requiredincertainaspectsof the presentinvention, this doesnotprecludeafindingthat^^ 
utility,assetfoithbytheFederalCircuit'sholdingin5ran^,wWchclearlystates,ashighUghtedi^ 
above, that "pharmaceutical inventions, necessarily includes the expectation of further research and 
development " (Brana at 1442-1443, emphasis added). In assessing the question of whether undue 
experimentation wouldberequiredinordertopracticetheclaimedinvention,thekeyterm is "undue",not 

"experimentation". In re Angstadt and Griffin, 190 USPQ 214 (CCPA 1976). The need for some 
experimentation doesnotrendertheclaimedinvention unpatentable. Indeed, a considerable amount of 
experimentationmaybepeniussibleifsuchexperimentationisroutinelypracticedintheart./nre^^^^ 

and Griffin, supra; Amgen. Inc. v. Chugai Pharmaceutical Co., Ltd., 18 USPQ2d 1016 (Fed. Cir. 
1991). Asamatteroflaw,itis well settled thatapatentneednotdisclosewhatiswellknownintheart. 

//I re Wanif5, 8 USPQ 2d 1400 (Fed. Cir. 1988). 

Finally, While Appellants are well aware of the new Utility Guidelines set forth by the USPTO, 
Appellants respectfiillypointoutthatthecuirentmlesandregulationsregardingtheexaminationofpa^ 

applications is and always has been the patent laws as set forth in 35 U.S.C. and the patent rules as set 
forth in 37 C.F.R., not the Manual of Patent Examination Procedure or particular guidelines for patent 
examination set forth by the USPTO. Furtiiermore, it is the job of the judiciary, not the USPTO. to 
interpret these laws and rules. Appellants are unaware of any significant recent changes in either 
35U.S.C. § 101,orintheinterpretationof 35U.S.C. § lOlbytheSupreme Court ortheFederal Circuit 
thatis inkeeping with the newUtiUtyOuideUnessetforthbytheUSFTO. This isundeiscoredbynumereus 
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patents that have been issued over the years that claim nucleic acid fragments that do not comply with the 
new Utility Guidelines. As examples of such issued U.S. Patents, the Board is invited to review U.S. Patent 
Nos. 5,817,479 (Exhibit L), 5,654, 173 (Exhibit M), and 5,552,28 1 (Exhibit N; each of which claims 
short polynucleotides), and recently issued u:s. Patent No. 6,340,583 (Exhibit O; which includes no 
working examples), none of which contain examples of the "real-world" utilities that the Examiner seems 
to be requiring. Additionally, the Officehas recently issued U.S. Patent 6,043,052 (Exhibit?), which 
concerns an "orphan" G-Protein coupled receptor identified based onlvon homoloev to the orphan 
receptor GPR25, similartothe situation with Appellants' currently claimed sequence. Importantly, this 
issued patent also contains no examples of the "real world" utilities seemingly required in the present case. 
As issued U.S. Patents are presumed to meet ^ of the requirements for patentability, including 
35 U.S.C. §§ 101 and 112, first paragraph (see Section Vin(B), below). Appellants submit that the 
presentpolynucleotidesmustalsomeettherequirementsof35U.S.C. § 101. While AppeUants understand 
that each application is examined on its own merits. Appellants are unaware of any changes to 
35U.S.C. §101,orin the interpretationof35U.S.C.§101bytheSupreme Court orthe Federal Circuit, 
since the issuance of these patents that render the subject matter claimed in these patents, which is similar 
to the subject matter in question in the present application, as suddenly non-statutory or failing to meet the 
requirements of 35 U.S.C. § 101. Thus, holding Appellants to a Afferent standard of utility would be 
arbitrary and capricious, and, like other clear violations of due process, cannot stand. 

For each of the foregoing reasons, Appellants submit that the rejection of claims 5-7 under 
35 U.S.C. § 101 must be overruled. 

B. Are Claims 5-7 Unusable Due to a Lack of Patentable UtiUty? 

The Final Action next rejects claims 5-7 under 35 U.S.C. § 1 12, first paragraph, since allegedly 
one skilled in the art would not know how to use the invention, as the invention allegedly is not supported 
by either a clear asserted utility or a well-established utility. 

The arguments detailed above in Section Vm(A) concerning the utility of the presentiy claimed 
sequences are incorporated herein by reference. As the Federal Circuit and its predecessor have 
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detennined that the utiUty requirement of Section 101 and the how to use requirement of Section 112, first 
paragraph, have the same basis, specifically the disclosure of a credible utility (In re Brana, supra; In re 
Jolles, 628 F.2d 1322. 1326 n.ll, 206 USPQ 885, 889 n.U (CCPA 1980); In re Fouche, 439 F.2d 
1237, 1243, 169 USPQ 429, 434 (CCPA 1971)), Appellants submit that as claims 5-7 have been shown 
to have "a specific, substantial, and credible utility", as detailed in Section Vin(A) above, the present 
rejection of claims 5-7 under 35 U.S.C. § 112, first paragraph, cannot stand. 

Appellants therefore submit that the rejection of claims 5-7 under 35 U.S.C. § 112, first paragraph. 

must be overruled. 
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IX. APPENDIX 

The claims involved in this appeal are as follows: 

5. (TwiceAmended) Anisolatednucleicacidexpressionvectorcomprisinganucleotictesequenc^ 



(a) encodes the amino acid sequence shown in SEQ ID NO: 2; and 

(b) hybridizestothenucleotidesequenceof SEQIDN0:1 orthecomplementthereof 
under highly stringent conditions of 0.5 M NaHPO^, 7% sodium dodecyl sulfate 
(SDS) and 1 mM EDTA at 65°C and washing in 0. Ix SSC/0. 1 %SDS at 6S°C. 

6. An isolated nucleic acid expression vector comprising the nucleotide sequence of SEQ ID 



that: 



N0:1. 



7. 



A host cell comprising the expression vector of claim 6. 



X. CONCLUSION 

Appellants respectfully submit that, in light of the foregoing arguments, the Final Action's conclusion 
that claims 5-7 lack a patentable utility and are unusable by the skilled artisan due to a lack of patentable 
utility is unwarranted. It is therefore requested that the Board overturn the Final Action's rejections. 

Respectfully submitted, 

David W. Hibler Reg. No. 41,071 

Agent For Appellants 

LEXICON GENETICS INCORPORATED 
8800 Technology Forest Place 
The Woodlands, TX 77381 
(281) 863-3399 
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Sbjct: 498 TCPPMRRWSSPRSSACAAAASYAVPGPGRLPAWPGAYGAPRALPAPSPGWRAWPLPAWST 677 

Query: 180 AGQARGWPPPRWPSRPPSCWCSRPT 204 

AGQARGWPPPRWPSRPPSCWCSRPT 
Sbjct: 678 AGQARGWPPPRWPSRPPSCWCSRPT 752 
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(54) Guanosine triphosphate-binding protein coupled receptors 



(57) The object of the present invention is to provide 
a technique for efficiently extracting GPCR sequences 
from human genome sequences, thereby comprehen- 
sively identifying novel GPCRs. An original automatic 



system for Wentifying GPCR sequences Is disclosed, 
and 1035 novel GPCRs are successfully identified from 
the entire human genome by utilizing the system. 
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Over the last decades distinct members of the G Protein-Coupled Receptor 
(GPCR) family emerged as prominent drug targets within pharmaceutical 
research, since approximately 60 % of marketed prescription drugs act by 
selectively addressing representatives of that class of transmembrane signal 
transduction systems. It is noteworthy that the majority of GPCR-targeted 
drugs elicit their biological activity by selective agonism or antagonism of 
biogenic monoamine receptors, while the development status of peptide- 
binding GPCR-addressing compounds is still in its infancy. Exemplified on 
selected medicinal chemistry projects, this review will focus on the 
opportunities of therapeutic intervention into a broad spectrum of disease 
processes through agonizing or antagonizing the functions of peptide-bindinj 
GPCRs. In this context, a brief overview of GPCR-mediated signal 
transduction pathways will be given in order to emphasize the biomedical 
relevance of a controlled modulation of receptor function. Modem trends on 
lead finding and optimization strategies for peptide-binding GPCR-targeted 
low-molecular weight compounds will be highlighted on the basis of current 
research programs conducted in the areas of angiotensin 11, endothelin, 
bradykinin, neurokinin, neuropeptide Y, LHRH, C5a antagonists, and 
somatostatin agonists, respectively. Special emphasis will be laid on the 
elaboration and utilization of structural rationales on the potential drug 
candidates, thus facilitating more detailed insights into the underlying 
molecular recognition event. 
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ABSTRACT 



A method and apparatus for preparation of a substrate 
containing a plurality of sequences. Photoremovable 
groups are attached to a surface of a substrate. Selected 
regions of the substrate are exposed to light so as to 
activate the selected areas. A monomer, also containing 
a photoremovable group, is provided to the substrate to 
bind at the selected areas. The process is repeated using 
a variety of monomers such as amino acids until sequen- 
ces of a desired length are obtained. Detection methods 
and apparatus are also disclosed. 
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ogy. A peptide is a sequence of amino acids. When the 

AjRRAY OF OLIGONUCLEOTIDES ON A SOLID twenty naturally occurring amino acids are condensed 

SUBSTRATE into polymeric molecules they form a wide variety of 

three-dimensional configurations, each resulting from a 

CROSS-REFERENCE TO RELATED 5 particular amino acid sequence and solvent condition. 

APPLICATIONS . The number of possible pentapeptides of the 20 natu- 

This appUcation is a Rule 60 Division of U.S. appHca- rally occurring amino acids, for example, is 205 or 3.2 

tion Ser. No. 850.356, filed Mar. 12, 1992. which is a million different peptides. The likelihood that molecules 

Rule 60 Division of U.S. application Ser. No. 492,462, of this size might be useful in receptor-bmdmg studies is 

filed Mar. 7, 1990. now U.S. Pat No. 5,143,854, which supported by epitope analysis studies showing that 

is a Continuation-in-Part of U.S. application Ser. No. some anybodies recognize sequences as short as a few 

362,901, filed Jun. 7, 1989, now abandoned, all assigned amino acids with high specificity. Furthermore, the 

to the assignee of the present invention. * average molecular weight of amino acids puts small 

The file of this patent contains drawings executed in peptides in the size range of many currently useful phar- 

color. Copies of this patent with color drawings will be maceutical products. 

provided by the Patent and Trademark Oflice upon Pharmaceutical drug discovery is one type of re- 
request and payment of the necessary fee. search which relies on such a study of structure-activity 
nr\v>^n^ inxrr MOTTr^p relationships. In most cases, contemporary pharmaceu- 
curiKiLrrti iNuiiu£, tical research can be described as the process of disco V- 
A portion of the disclosure of this patent document 20 novel ligands with desirable patterns of specificity 
contains material which is subject to copyright protec- for biologically important receptors. Another example 
tion. The copyright owner has no objection to the fac- js research to discover new compounds for use in agri- 
simile reproduction by anyone of the patent document culture, such as pesticides and herbicides, 
or the patent disclosure as it appears in the Patent and Sometimes, the solution to a rational process of de- 
Trademark Office patent file or records, but otherwise 25 gjg^^g ligands is difficult or unyielding. Prior methods 
reserves all copyright rights whatsoever. of preparing large numbers of different polymers have 
BACKGROUND OF THE INVENTION ^^^^ painstakingly slow when used at a scale sufficient 

^ . ^ to permit .effective rational or random screening. For 
The present inventions relate to the synthesis and example, the "Merrifield" method (/. Anu Chem. Soc, 
placement of materials at known locations. In particu- 30 85:2149-2154, which is mcorporated herein by. 
lar. one embodiment of the mventions provides a reference for all purposes) has been used to synthesize 
method and associated apparatus for preparmg diveree ^ ^ Merrifield method, 
chemical sequences at known locations on a smgle sub-. covalendy bonded to a support made of 
sUatesurface.TTiemvenUonsm^^^^ an insoluble polymer. Another amino acid with an alpha 
pie, in the field of preparation of ohgomer, peptide, 35 P > covalently bonded 
nucleic acid, ohgosacchande, phosphohpid, polymer acid to form a dipeptide. After washing, the 
or drug congener preparation, especially to cr<^te ^ ^^^^^ _ ^ ^^^^^^ ^ amino acid 
sources of chemical diversity for use m screemng for pro^cuvc gruup u, rcmuvcu *mu a um w 
biological activity. • with an alpha protective group is added to the dipep- 
The relationsHp between stnictiire and activity of 40 tide This process is continued until a peptide of a de- 
molecules is a fundamental issue in the stiidy of biologi- siredlengthandsequenceisobtained.Usmg^ 
cal systems. Structtire-activity relationships are impor- ^^Id method, it is not economically practical to synthe- 
tant in understanding, for example, the function of en- s^^e more than a handful of peptide sequences m a day. 
zymes, the ways in which cells communicate with each To synthesize larger numbers of polymer sequences, 
other, as weU as ceUular control and feedback systems. 45 ^^^s also been proposed to use a senes of reaction 
Certain macromolecules are known to interact and vessels for polymer synthesis. For example, a tiibular 
bind to other molecules havmg a very specific three-di- reactor system may be used to synthesize a linear poly- 
mensional spatial and electi-onic distiibution. Any large mer on a soUd phase support by automated sequential 
molecule having such specificity can be considered a addition of reagents. This method stiO does not enable 
receptor, whether it is an enzyme catalyzing hydrolysis 50 the synthesis of a sufficiently large number of polymer 
of a metabolic intermediate, a ceU-surface protein medi- sequences for effective economical screening, 
ating membrane transport of ions, a glycoprotein serv- Methods of preparing a pluraUty of polymer sequen- 
ing to identify a particular cell to its neighbors, an IgG- ces are also known in which a porous container encloses 
class antibody circulating in the plasma, an oligonucleo- a known quantity of reactive particles, the particles 
tide sequence of DNA in the nucleus, or the like. The 55 being larger in size than pores of the container. The 
various molecules which receptors selectively bind are containers may be selectively reacted with desired ma- 
known as ligands. terials to synthesize desired sequences of product mole- 
Many assays arc available for measuring the binding cules. As with other methods known in the art, this 
affinity of known receptors and ligands, but the infer- method cannot practically be used to synthesize a suffi- 
mation which can be gained from such experiments is 60 cient variety of polypeptides for effective screening, 
often limited by the number and type of ligands which Other techniques have also been described. These 
are available. Novel ligands are sometimes discovered methods include the synthesis of peptides on 96 plastic 
by chance or by application of new techniques for the . pins which fit the format of standard microliter plates, 
elucidation of molecular structure, including x-ray crys- Unfortunately, while these techniques have been some- 
tallographic analysis and recombinant genetic tech- 65 what useful, substantia] problemis re main . For example, 
niques for proteins. these methods continue to be limited in the diversity of 
Small peptides arc an exemplary system for exploring sequences which can be economically synthesized and 
the relationship between structure and function in biol- - screened 
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From the above, it is seen that an improved method mask is placed on or focused on the substrate and illuxm- 
and apparatus for synthesizing a variety of chemical nated so as to deprotect selected regions of the substrate 
sequences at known locations is desired. in the reactor space. A monomer is pumped through the 

TVTM A p V np xwp TMVPTsmnM ^^^^^ otherwise contacted with the substrate 

SUMMARY OF THE INVENTION 5 ^^^^ ^j^^ deprotected regions. By selectively 

An improved method and apparatus for the prepara- deprotecting regions on the substrate and flowing pre- 

tion of a variety of polymers is disclosed. determined monomers through the reactor space, de- 

In one preferred embodiment, linker molecules are sired polymers at known locations may be synthesized, 

provided on a substrate. A terminal end of the linker Improved detection apparatus and methods are also 
molecules is provided with a reactive functional group 10 disclosed. The detection method and apparatus utilize a 

protected with a photoremovable protective group. substratehavingalarge variety of polymer sequences at 

Using lithographic methods, the photoremovable pro- known locations on a surface thereof. The substrate is 

tective group is exposed to light and removed from the exposed to a fluorescently labeled receptor which binds 

linker molecules in first selected regions. The substrate qj. ^ore of the polymer sequences. The substrate 

is then washed or otherwise contacted with a first mon- 15 placed in a microscope detection apparatus for identi- 

omer that reacts with exposed functional groups on the fication of locations where binding takes place. The 

linker molecules. In a preferred embodiment, the mono- microscope detection apparatus includes a monochro- 

mer is an amino acid containing a photoremovable pro- ^^^^ polychromatic light source for directing light 

tective group at its amino or carboxy terminus and the 3^ substrate, means for detecting fluoresced light 

linker molecule terminates in an amino or carboxy acid 20 substrate, and means for determining a loca- 

group beanng a photoremovable protective group. fluoresced light. The means for detecting 

A second set of selected regions is, ther^r, ex- ^-^^ fluoresced on the substrate may in some embodi- 

posed to hght and tlie photoremovable protective group ^^^^ ^^j^^^ ^ ^^^^^ ^^^^^ ^^^^^^ 

on the hnker molecule/protected ammo acid is re- ^^ga location of the fluoresced light may include an 

moved at the second set of regions. The substra^^ 25 ^Liadon table for the substrate^ransLtion of the 

contacted with a second monomer contammg a jjj ju 

, ^ * r •*u shde and data collection are recorded and managed by 

photoremovable protective group for reacaon with . ^ , j j- •* 1 * 

wposed functional groups. ^ process is repeated to "^,^1? 7 pro^ammed d,g,tal computer, 

selectively apply monomers mitil polymers of a desired A further understandmg of the nature and advantages 

length and desired chemical sequence are obtained. 30 of the mventions herem may be reahzed by reference to 

PhotolabUe groups are then optionally removed and the 1^^°°" specification and the at- 

sequence is, thereafter, optionally capped. Side chain tached drawings. 

protective groups, if present, are also removed. BRIEF DESCRIFTION OF THE FIGURES 

By using the lithographic techniques disclosed ^ ^ 
herein, it is possible to direct Ught to relatively small 35 F^^^' ^ iUustrates maslang and irradiation of a sub- 
and precisely known locations on the substrate. It is, ^trate at a first location. The substrate is shown m cross- 
therefore, possible to synthesize polymers of a known section; 

chemical sequence at known locations on the substrate. ^ iU»istrates the substrate after appUcation of a 

The resulting substrate will have a variety of uses monomer A"; 

including, for example, screening large numbers of pol- 40 ^ illustrates irradiation of the substrate at a 

ymers for biological activity. To screen for biological second location; 

activity, the substrate is exposed to one or more recep- ^ illustrates the substrate after appHcation of 

tors such as antibodies whole cells, receptors on vesi- monomer "B"; 

cles. Hpids, or any one of a variety of o±er receptors. ^1^- ^ illustrates irradiation of the "A" monomer; 

The receptors are preferably labeled with, for example, 45 ^ illustrates the substrate after a second applica- 

a fluorescent marker, radioactive marker, or a labeled ^^n of "B"; 

antibody reactive with the receptor. The location of the ^I^- illustrates a completed substrate; 

marker on the substrate is detected with, for example, FIGS. 8A and 8B illustrate alternative embodiments 

photon detection or autoradiographic techniques. a reactor system for forming a plurality of polymers 

Through knowledge of the sequence of the material at 50 on a substrata - 

the location where binding is detected, it is possible to 9 illustrates a detection apparatus for locating 

quickly determine which sequence binds with the re- fluorescent markers on the substrata 

ceptor and, therefore, the technique can be used to FIGS. lOA-lOM illustrate the method as it is applied 

screen large numbers of peptides. Other possible appli- to production of the trimers of monomers "A** and 

cations of the inventions herein include diagnostics in 55 *'B"; 

which various antibodies for particular receptors would FIGS. IIA and IIB are fluorescence traces for stan- 

be placed on a substrate and, for example, blood sera dard fluorescent beads; 

would be screened for immune deficiencies. Still further FIGS. 12A and 12B are fluorescence curves for 

applications include, for example, selective "doping" of NVOC {6-nitroveratryloxycarbonyl) slides not exposed 

organic materials in semiconductor devices, and the 60 and exposed to light respectively; 

like. FIGS. 13A to 13D are fluorescence plots of slides 

In connection with one aspect of the invention an exposed through 100 pun, 50 fun, 20 ^im, and 10 ;im 

improved reactor system for synthesizing polymers is Tn«Vg- 14A and 14B illustrate formation of YGGFL (a 

also disclosed. The reactor system includes a substrate peptide of sequence H2N-tyrosine-glycine-glycine- 

mount which engages a substrate around a periphery 65 phenylalaiiine-leucine-C02H} and GGFL (a peptide of 

thereof. The substrate mount provides for a reactor sequence HiN-glydne-glycine-phenylalanine-leucine- 

space between the substrate and the mount through or C02H), followed by exposure to labeled Herz antib6dy 

into which reaction fluids are pumped or flowed. A (an antibody that recognizes YGGFL but not GGFL); 
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FIGS. ISA and 15B fluorescence plots of a slide with 
a checkerboard pattern of YGGFL and GGFL exposed 
to labeled Herz antibody; FIG. ISA illustrates a 
500 X 500 fim mask which has been focused on the sub- 
strate according to FIG. 8A while FIG. 15B illustrates 5 
a 50 x 50 mask placed in direct contact with the 
substrate in accord with FIG. 8B; 

FIG. 16 is a fluorescence plot of YGGFL and 
PGGFL synthesized in a 50 fim checkerboard pattern; 

FIG. 17 is a fluorescence plot of YPGGFL and lo 
YGGFL synthesized in a 50 ;im checkerboard pattern; 

FIGS. 18A and 18B illustrate the mapping of sixteen 
sequences synthesized on two different glass slides; 

FIG. 19 is a fluorescence plot of the slide illustrated 
in FIG. 18 A; and 15 

FIG. 20 is a fluorescence plot of the slide illustrated 
in FIG. lOB. 
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L Glossary 

The following terms are intended to have the follow- . 55 
ing general meanings as they are used herein: 

1. Complementary: Refers to the topological compati- 
bility or matching together of interacting surfaces of 
a ligand molecule and its receptor. Thus, the receptor 
and its ligand can be described as complementary, 60 
and furthermore, the contact surface characteristics 
are complementary to each other. 

2. Epitope: The portion of an antigen molecule which is 
delineated by the area of interaction with the subclass 
of receptors known as antibodies. 65 

3. Ligand: A ligand is a molecule that is recognized by 
a particular receptor. Examples of ligands that can be 
investigated by this invention mclude, but are not 



restricted to, agonists and antagonists for cell mem- 
brane receptors, toxins and venoms, viral epitopes, 
hormones (e.g., steroids, etc.), hormone receptors, 
peptides, enzymes, enzyme substrates, cofactors, 
drugs (e.g., opiates, etc), lectins, sugars, oligonucleo- 
tides, nucleic acids, oligosaccharides, proteins, and 
monoclonal antibodies. 

4. Monomer: A member of the set of small molecules 
which can be joined together to form a polymer. The 
set of monomers includes but is not restricted to, for 
example, the set of common L-amino acids, the set of 
D-amino acids, the set of synthetic amino acids, the 
set of nucleotides and the set of pentoses and hexoses. 
As used herein, monomers refers to any member of a 
basis set for synthesis of a polymer. For example, 
dimers of L-amino acids form a basis set of 400 mono- 
mers for synthesis of polypeptides. Different basis 
sets of monomers may be used at successive steps in 
the synthesis of a polymer. 

5. Peptide: A polymer in which the monomers are alpha 
amino acids and which are joined together through 
amide bonds and alternatively referred to as a poly- 
peptide. In the context of this specification it should 
be appreciated that the amino acids may be the L- 
optical isomer or the D-optical isomer. Peptides are 
more than two amino acid monomers long, and often 
more than 20 amino acid monomers long. Standard 
abbreviations for amino acids are used (e.g., P for 
proline). These abbreviations are included in Stryer, 
Biochemstry, Third Ed., 1988, which is incorporated 
herein by reference for aU purposes. 

6. Radiation: Energy which may be selectively applied 
including energy having a wavelength of between 
10-*^ and 10* meters mcluding, for example, electron 
beam radiation, gamma radiation, x-ray radiation, 
ultraviolet radiation, visible light, infrared radiation, 
microwave radiation, and radio waves. "Irradiation" 
refers to the application of radiation to a surface. 

7. Receptor: A molecule that has an affinity for a given 
ligand. Receptors may be naturally-occuring or man- 
made molecules. Also, they can be employed in their 
unaltered state or as aggregates with other species. 
Receptors may be attached, covalently or noncova- 
lentiy, to a binding member, either direcdy or via a 
specific binding substance. Examples of receptors 
which can be employed by this invention include, but 
are not restricted to, antibodies, cell membrane recep- 
tors, monoclonal antibodies and antisera reactive 
with specific antigenic determinants (such as on vi- 
ruses, cells or other materials), drugs, polynucleo- 
tides, nucleic acids, peptides, cofactors, lectins, sug- 
ars, polysaccharides, cells, cellular membranes, and 
organelles. Receptors are sometimes referred to in the 
art as anti-ligands. As the term receptors is used 
herein, no difference in meaning is intended. A "Li- 
gand Receptor Pair** is formed when two macromol- 
ecules have combined through molecular recognition 
to form a complex. 

Other examples of receptors which can be investi- 
gated by this invention include but are not restricted to: 

a) Microorganism receptors: Determination of li- 
gands which bind to receptors, such as specific 
transport proteins or enzymes essential to survival 
of microorganisms, is useful in a new class of antibi- 
otics. Of particular value would be antibiotics 
against opportunistic fung^ protozoa, and those 
bacteria resistant to the antibiotics in current use. 
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b) Enzymes: For instance, the binding site of enzymes o-Hydroxy-a-mcthyl cinnamoyl, and 2-Oxymethy- 
such as the enzymes responsible for cleaving neu- lene anthraqiiinone. Other examples of activators 
rotransmitters; determination of ligands which bind include ion beams, electric fields, magnetic fields, 
to certain receptors to modulate the action of the electron beams, x-ray, and the like. 

enzymes which cleave the different neurotransmit- 5 10. Predefined Region: A predefined region is a local- 

ters is useful in the development of drugs which ized area on a surface which is, was, or is intended to 

can be used in the treatment of disorders of neuro- be activated for formation of a polymer. The prede- 

transmission. fined region may have any convenient shape, e.g., 

c) Antibodies: For instance, the invention may be circular, rectangular, elliptical, wedge-shaped, etc. 
useful in investigating the ligand-binding site on the 10 por the sake of brevity herein, "predefined regions" 
antibody molecule which combmes with the epi- sometimes referred to simply as "regions." 
tope of an antigen of interest; determining a se- Substantially Pure: A polymer is considered to be 
quence that mimics an antigenic epitope may lead "substantially pure" within a predefined region of a 
to the development of vaccines of which the immu- substrate when it exhibits characteristics that distin- 
nogen is based on one or more ofsuch sequences or 15 fj-Q^ q^^^q^ predefined regions. Typically, 
lead to the development of related diagnostic purity will be measured in terms of biological activity 
agents or compounds useful in therapeutic treat- ^j. function as a result of uniform sequence. Such 
ments such as for auto immune diseases (e.g., by characteristics will typically be measured by way of 
blocking the binding of the "self antibodies). binding with a selected ligand or receptor. 

d) Nucleic Acids: Sequences of nucleic acids may be 20 jj Qgne^jj 

synthesized to establish DNA or RNA binding 'y^^ ^^^^^^ invention provides methods and appara- 

sequences. „ , r t_i , tus for the preparation and use of a substrate having a 

e) Catalytic Polypeptides: Polymers, preferably poly- sequences in predefined regions, 
peptides, which are capable of promotmg a chemi- ^^^^^^ ^ ^^^^^ ^^^^^ ^^ ^^ 

cal reaction mvolvmg the conversion of one or 25 the preparation of molecules containing sequences of 
more r^ctants to one or more products Such ^/^^^ ^^^^ ^ 
polypeptides generaUy mclude a bmdmg site spe- polymers. Such polymers include, for 

cific for at least one reactant or reaction mtermedi- , P^*y"'='=»- -^"^^ ^ ill S^li i • ^ 

ate and an active functionality proximate to the ^^P^^ ^^^^^ polymers of nucleic 

binding site, which functionity is capable of 30 polysacch^des. phosphoUpids and peptides 

chemi^y modifying the bomid reactait Cata- f^^^? f^^" acids, heteropolymers 

lytic polypeptides are described in, for example, ^ ^^^^ ^ ^ coyalentiy bound to any of 

U.S. appHcation Ser. No. 404.920. which is incor- ^^e- polyuretiianes, polyesters, polycarbonates, 

porated herein by reference for aU purposes. P^^^^^ polyamides, polyethyleneimmes, polyary- 

0 Hormone receptors: For instance, the receptors for 35 ^^ne sulfides. polysiloxan«, polyumdes, polyacetates. or 
insulin and growdi hormone. Determination of the other polymers which will be apparent upon review of 
ligands which bind with high affinity to a receptor ^his disclosure. In a preferred embodimen^ the mven- 
is useftil in the development of. for example, an oral tion herem is used m the synth^is of peptide, 
replacement of the daily injections which diabetics ^« prepared substrate may, for example, be used m 
must take to reUeve the symptoms of diabetes, and 40 screening a variety of polymers as Ugands for bmding 
in the other case, a replacement for the scarce ^ receptor, although it will be apparent that the 

human growth hormone which can only be ob- invention could be used for the synthesis of a receptor 
tained from cadavers or by recombinant DNA for binding with a Ugand. The substrate disclosed herein 
technology. Other examples are the vasoconstric- will have a wide variety of other uses. Merely by way of 
tive hormone receptors; determination of those 45 example, the invention herein can be used in determin- 
ligands which bmd to a receptor may lead to the peptide and nucleic acid sequences which bind to 

development of drags to control blood pressure. proteins, finding sequence-specific binding drugs, iden- 

g) Opiate receptors: Determination of ligands which tifying epitopes recognized by antibodies, and evalua- 
bind to the opiate receptors in the brain is useful in ^on of a variety of drags for clinical and diagnostic 
the development of less-addictive replacements for 50 applications, as well as combinations of the above, 
morphine and related drags. The invention preferably provides for the use of a 

8. Substrate: A material having a rigid or semi-rigid substrate "S" with a surface. Linker molecules "L" are 
surface. In many embodiments, at least one surface of optionally provided on a surface of the substrate. The 
the substrate will be substantially flat, although in purpose of the linker molecules, in some embodiments, 
some embodiments it may be desirable to physically 55 is to facilitate receptor recognition of the synthesized 
separate synthesis regions for different polymers polymers. 

with, for example, wells, raised regions, etched Optionally, the linker molecules may be chemic a lly 
trenches, or the like. According to other embodi- protected for storage purposes. A chemical storage 
ments, ^mall beads may be provided on the surface protective group such as t-BOC (t-butoxycarbonyl) 
which tnay be released upon completion of the syn- 60 may be used in some embodiments. Such chemical pro- 
thesis, tective groups would be chemically removed upon 

9. Protective Group: A material which is bound to a exposure to, for example, acidic solution and would 
monomer unit and which may be spatially removed serve to protect the surface during storage and be re- 
upon selective exposure to an activator such as elec- moved prior to polymer preparatioiL 

tromagnetic radiatioiu Examples of protective groups 65 On the substrate or a distal end of the linker mole- 
with utility herein include Nitroveratryloxy car- cules, a functional group with a protective group Po is 
bonyl, Nitrobenzyloxy carbonyl. Dimethyl dime- provided. The protective group Po may be removed 
thoxybenzyloxy carbonyl, 5-Bromo-7-nitroindolinyl, upon exposure to radiation, electric fields, electric cur- 
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rents, or other activators to expose the functional followed by contacting with Mi-P, resulting in the se- . 

group. quence S-Mi-P at the first location. The second loca- 

In a preferred embodiment, the radiation is ultraviolet tions would then be irradiated and contacted with 
(UV), infrared (IR), or visible light As more fully de- M4-P, resulting in the sequence S-M4-P at the second 
scnbed below, the protective group may alternatively 5 locations. Thereafter both the first and second locations 
be an electrochemically-sensitive group which may be would be irradiated and contacted with the dimer Ma- 
removed in the presence, of an electric field. In still M3, resulting in the sequence S-M1-M2-M3 at the first 
further alternative embodiments, ion beams, electron locations and S-M4-M2-M3 at the second locations. Of 
beams, or the like may be used for deprotection. course, common subsequences of any length could be 

In some embodiments, the exposed regions and, 10 utilized including those in a range of 2 or more mono- 

therefore, the area upon which each distinct polymer j^gj-j^ 2 to 100 monomers, 2 to 20 monomers, and a most 

sequence is synthesized are smaller than about 1 cm^ or preferred range of 2 to 3 monomers, 

less than 1 mm2. In preferred embodiments the exposed According to other embodiments, a set of masks is 

area is less than about 10,000 ^m^ or, more preferably. f^j. monomer layer and, thereafter, varied 

less than 100 ^m^and may, in some embodiments, en- 15 j-gjj^ wavelengths are used for selective deprotection. 

compass the binding site for as few as a single molecule. p^^. example, in the process discussed above, first re- 

Within these regions, each polymer is preferably syn- ^^^^ exposed through a mask and reacted %vith 

thesized m a substantially pure form. a first monomer having a first protective group Pi, 

Concurrentiy or after exposure of a known region of ^^^^ ^ removable upon exposure to a first wavelength 

Uie substrate to hght, the surface is contacted with a 20 ^ ^ j^y Stoond regions are masked and re- 

first monomer unit Mi which reacts v^th the fimctional ^^^^^ ^ ^^^^^ ^^^^^^^ ^ ^^^^^ p^^^^. 

^oupwhichhas been deposed by thedeprotectionstep. ^^^^ removable upon exposure to a 

Tlie first monomer includes a protective group Pi. Pi wavelength of light (e.g., UV). Thereafter, 

may or may not the same as Po. masks become umiecessary in the synthesis because the 

Accordmgly, after a first cycle known first regions 25 ^^^^ ^^^^^^^^ may^sed alte^tively to the first 

of the surface may comprise the sequence: ^^^^^ wavelengths of Hght in the deprotection 

s-L-Mi-Pi cycle. 

The polymers prepared on a substrate accordmg to 
while remaining regions of the surface comprise the 3Q theabovemethods will have a variety of uses including, 
sequence: f^or example, screening for biological activity. In such 

screening activities, the substrate containing the sequen- 
s-L-Pq. ces is exposed to an unlabeled or labeled receptor such 

as an antibody, receptor on a cell, phospholipid vesicle. 
Thereafter, second regions of the surface (which may or any one of a variety of other receptors. In one pre- 
include the first region) are exposed to light and con- ferred embodiment the polymers are exposed to a first, 
tacted with a second monomer M2 (which may or may unlabeled receptor of interest and, thereafter, exposed 
not be the same as Mi) having a protective group P2. P2 to a labeled receptor-specific recognition element, 
may or may not be the same as Po and Pi. After this which is, for example, an antibody. This process will 
second cycle, different regions of the substrate may ^ provide signal amplification in the detection stage, 
comprise one or more of the following sequences: xhe receptor molecules may bind with one or more 

polymers on the substrate. The presence of the labeled 
S-L-Mi-Mi-PiS-l^MrPiS-L-Mi-Pi and/or receptor and, therefore, the presence of a sequence 

^ which binds with the receptor is detected in a preferred 

The above process is repeated until the substrate in- 45 «nbodimentdiroughtheuse ofautoradiogiap^^^^^^^ 
eludes desir^ polymers ofd^ed lengths. By control- ^on of fluorescence witii a charge^upled device^AucH 
ling the locatio^ of the substrate exposed to light and ^escence microscopy, or the like TTie sequence of the 
the reagents exposed to the substrate foUowing expc^ ^^y^^\ the locations where Ae receptor bmdingj 
sure, the locatioVof Kich sequence will be known. detected may be used to determine aU or part of a se- 

Thereafter, the protective groups are removed from 50 q^ence which is coinplementary to the recq)tor. 
some or all of the substrate and the sequences are, op- Use of the mvention hercm is illustrated pnmardy 
tionany, capped with a capping unit C. The process reference to screemng for biological activity. The 

results in a substrate having a surface with a plurality of invention will, however, find many otiier uses. For 
polymers of the following general formula: example, the mvention may be us«i m information stor- 

55 age (e.g., on optical disks), production of molecular 
S-{LHMXM|eKMjO . . . (MjiHC] electronic devices, production of stationary phases in 

separation sciences, production of dyes and brightening 
where square brackets indicate optional groups, and M/ . agents, photography, and in immobilization of cells, 
. . . Mx indicates any sequence of monomers. The num- proteins, lectins, nucleic acids, polysaccharides and the 
ber of monomers could cover a wide variety of values, 60 like in patterns on a surface via molecular recognition of 
but in a preferred embodiment they will range ft-om 2 to specific polymer sequences. By synthesizing the same 
100. compound in adjacent, progressively differing concen- 

In some embodiments a plurality of locations on the trations, a gradient will be established to control chemo- 
substrate polymers are to contain a common monomer taxis or to develop diagnostic dipsticks which, for ex- 
subsequence. For example, it may be desired to synthe- 65 ample, titrate an antibody against an increasing amount 
size a sequence S-M1-M2-M3 at first locations and a of antigen. By synthesizing several catalyst molecules in 
sequence S-M4-M2-M3 at second locations. The process close proximity, more efficient multistep conversions 
would commence with irradiation of the first locations may be achieved by "coordinate immobilization." Co- 
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ordinate immobilization also may be used for electron completed substrate to interact freely with molecules 
transfer systems, as well as to provide both structural exposed to the substrate. The linker molecules should 
integrity and other desirable properties to materials be 6-50 atoms long to provide sufficient exposure. The 
such as lubrication, wetting, etc. linker molecules may be, for example, aryl acetylene. 

According to alternative embodiments, molecular 5 ethylene glycol oligomers containing 2-10 monomer 
. biodistribution or pharmacokinetic properties may be units, diamines, diacids, amino acids, or combinations 
examined. For example, to assess resistance to intestinal thereof. Other linker molecules may be used in light of 
or serum proteases, polymers may be capped with a this disclsoure. 

fluorescent tag and exposed to biological fluids of inter- According to alternative embodiments, the linker 
est 10 molecules are selected based upon their hydrophilicA 

III. Polymer Synthesis hydrophobic properties to improve presentation of syn- 

FIG. 1 illustrates one embodiment of the invention thesized polymers to certain receptors. For example, in 
disclosed herein in which a substrate 2 is shown in the case of a hydrophilic receptor, hydrophilic linker 
cross-section. Essentially, any conceivable substrate molecules wOl be preferred so as to permit the receptor 
may be employed in the invention. The substrate may 15 more closely approach the synthesized polymer, 
be biological, nonbiological, organic, inorganic, or a According to another alternative embodiment, linker 
combination of any of these, existing as particles. molecules are also provided with a photocleavable 
strands, precipitates, gels, sheets, tubing, spheres, con- ^ intermediate position. The photocleavable 

miners, capillaries, pads. sUces. filnK, plates, slides etc. J3 preferably cleavable at a wavelength different 

The substrate niay have any convement shape, such as a 20 protective group. This enables removal of the 

disc, square, sphere, cucle, etc. The substrate is prefera- ^^^^ polymers foUowing completion of the synthesis 
bly flat but may take on a variety of alternative surface ^ ^^^^^ wavelengths of 

configurations. For example, the substrate may contam UbKl 

raised or depr«sed regions on which the synthesis takes ^n^^ molecules can be attached to the substrate 
pla«. The substrate and its surface preferably fonn a 25 ^ carbon-carbon bonds using, for example, (poly)tri- 
ngid support on which to carry out the reactions de- , , ^, , _f r if -i 

scribed herein. The substrate and its surface is also AuorocWoroethylene surfaces, or preferably, by silox- 
chosen to provide appropriate light-absorbing charac- ane bonds (usmg. for ^cample, glass or sihcon oxide 
teristics. For instance, the substrate may be a polymer- ^J^^ff ^^oxane bonds with surface of the sub- 
ized Langmuir Blodgett film, functionalized glass. Si. 30 s^te may be formed m one embodmien wa reactions 
Ge. GaAs. GaP. SiOi, SIN4. modified siHcon. or any f^^^^' molecules bearmg tnchlorosilyl groups. The 
oneofawidevarietyofgelsorpolymerssuchas(poly>. ^er molecules may optionaUy be atta^^ 
tetrafluoroethylene. (poly)vinyUdenedifluoride. poly- <^^^^^ head groups m a poly- 

styrene, polycarbonate, or combinations thereof. Other ' menzed Langmuir Blodgett fflm. In alternative embodi- 
substrate materials will be readUy apparent to those of 35 molecules are adsorbed to the surface 

skill in the art upon review of this disclosure. In a pre- ^® substrate. 

ferred embodiment the substrate is flat glass or single- molecules and monomers used hercm are 

crystal silicon with surface relief features of less than 10 provided with a functional group to which is bound a 
A. protective group. Preferably, the protective group is on 

According to some embodiments, the surface of the 40 the distal or terminal end of the linker molecule oppo- 
substrate is etched using well known techniques to pro- site the substrate. The protective group may be either a 
vide for desired surface features. For example, by way negative protective group O-c, the protective group 
of the formation of trenches, v-grooves, mesa stnic- renders the linker molecules less reactive with a mono- 
tures, or the like, the synthesis regions may be more mer upon exposure) or a positive protective group O-c. 
closely placed within the focus point of impinging light. 45 the protective group renders the linker molecules more 
be provided with reflective "mirror" structures for reactive with a monomer upon exposure). In the case of 
maximization of light collection from fluorescent negative protective groups an additional step of r^ti- 
sources, or the like. vation wQl be required. In some embodiments, this will 

Surfaces on the solid substrate will usuaUy, though be done by heating, 
not always, be composed of the same material as the 50 The protective group on the linker molecules may be 
substrate. Thus, the surface may be composed of any of selected from a wide variety of positive light-reactive 
a wide variety of materials, for example, polymers, groups preferably including nitro aromatic compounds 
plastics, resins, polysaccharides, silica or silica-based such as o-nitrobenzyl derivatives or benzylsulfonyL In a 
materials, carbon, metals, inorganic glasses, membranes, preferred embedment, 6-nitroveratryloxycarboayl 
or any of the above-listed substrate materials. In some 55 (NVOQ, 2-nitrobeiizyIoxycarbonyl (NBOQ or a,a- 
embodiments the surface may provide for the use of dimethyl-dimethoxyb^ozyloxycarbonyl (DDZ) is used, 
caged binding members which are attached firmly to In one embodiment; a nitro aromatic compound con- 
the surface of the substrate. Preferably, the surface will taining a benzylic hydrogen oxtho to the nitro group is 
contain reactive groups, which could be caiboxyl, used, Le., a chemical of the form: 
amino, hydroxyl, or the like. Most preferably, the sur- 60 
face will be optically transparent and will have surface 
Si — OH functionalities, such as are found on silica sur- 
faces. 

The surface 4 of the substrate is preferably provided 
with a layer of linker molecules 6, although it will be 65 
understood that the linker molecules are not required 
elements of the invention. The linker molecules are 
preferably of sufficient length to permit polymers in a 
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TABLE 


1 




Approximate 


Group 


Deprotection Wavelength 


Nitrovcntryloxy caibonyl (NVOC) 


UV (300-400 nm) 


Nitrobenzyloxy carbonyl (NBOC) 


UV (300-330 nm) 


Dimethyl dlmethoxybenzyloxy 


UV (280-300 nm) 


carbonyl 




S-Bromo-7-nitroixidolinyI 


UV (420 nm) 


o-Hydroxy-a-methyl cinnamoyl 


UV (300-350 nm) 


2-Oxymethylene anthraquinone 


UV (350 nm) 
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where Ri is alkoxy, alkyl, halo, aryl, alkenyl, or hydro- comprise a molecule which is decomposed by light such 

gen; R2 is alkoxy, alkyl, halo, aiyl, nitro, or hydrogen; as quinone diazide or a material which is transiently 

R3 is alkoxy, alkyl, halo, nitro, aryl, or hydrogen; R4 is bleached at the wavelength of interest Transient 

alkoxy, alkyl, hydrogen, aryl, halo, or nitro; and R5 is bleaching of materials will allow greater penetration 

alkyl, aikynyl, cyano, alkoxy, hydrogen, halo, aryl, or 5 where light is applied, thereby enhancing contrast. Al- 

alkenyl. Other materials which may be used include tematively, contrast enhancement naay be provided by 

o-hydroxy-a-methyl cirmamoyl derivatives. Photore- way of a cladded fiber optic bundle, 

movable protective groups are described in, for exam- The light may be from a conventional incandescent 

pie, Patchomik, X Anu Chenu Soc. (1970) 92:6333 and source, a laser, a laser diode, or the like. If non-col- 

Amit et al., /. Org. Chem. (1974) 39:192, both of which 10 Umated sources of light are used it may be desirable to 

are incorporated herein by reference. provide a thick- or multi-layered mask to prevent 

In an alternative embodiment the positive reactive spreadingof the light onto the substrate. It may, further, 

group is activated for reaction with reagents in solution. )^ desirable in some embodiments to utilize groups 

For example, a 5-bromo-7-mtro iiidoline group, when which are sensitive to different .wavelengths to control 

bound to a carbonyl, undergoes reaction upon exposure 15 synthesis. For example, by using groups which are sen- 

to light at 420 nm. sitive to different wavelengths, it is possible to select 

In a second alternative embodiment, the reactive branch positions in the synthesis of a polymer or elimi- 

group on the linker molecule is selected from a wide ^^^^ certain masking steps. Several reactive groups 

variety of negative light-reactive groups including a ^^^^ jj^^j^ corresponding wavelengths for depro- 

cmanmiate group. 20 ^^^^^ provided in Table 1. 

Alternatively, the reactive group is activated or deac- 
tivated by electron beam lithography, x-ray lithogra- 
phy, or any other radiation. Suitable reactive groups for 
electron beam lithography include sulfonyl. Other 

methods may be used including, for example, exposure 25 
to a current source. Other reactive groups and methods 
of activation may be used in light of this disclosure. 

As shown in FIG. 1, the linking molecules are prefer- 
ably exposed to, for example, light through a suitable 

mask 8 using photolithographic techniques of the type 30 
known in the semiconductor industry and described in, 

for example, Sze, VLSI Technology, McGraw-Hill invention is iUustrated primarily herein by 

(1983), and Mead et al Introduction to VLSI Systems, the use of a mask to illuminate selected regions 

Addison-Wesley ( 980), which are mcorporated herein substrate, other techniques may also be used. For 

by reference for all purposes. The hght may be directed 35 j ^^^^^^^ ^ translated under a modu- 

at either the surface contammg the protective groups or j^^^ ^ ^^^^ ^^^^ techniques are 

at the back of the substrate, so long as the substote is ^^^^ ^ j ^ g. Pat No. 4,719.615 

transparent to the wavelength of light neededfor re- ^^^^ is incorporated herein by refer- 

moval of the protective groups In the embodiment ^^^^^ embodimen^ts a laser galvanometric 

shown m FIG. 1, hght IS direct^^ 40 .^amier is utilized. In other embodiments the synthesis 

substrate contammg the protective groups. FIG. 1 illus- . , , • * * ..v *: 1 

f i_ I • ^ T *!. may take place on or m contact with a conventional 

trates the use of such masking techmques as they are . *; , / r j * u «r i.* 1 »\ « 

appUed to a positive reactive group sl> as to activate '^^''^ ^ ^ ^f"' ^^^^ > 

ig molecSes and expose Actional groups in areas fiber optic light sources. By approprmtely modi^d^^ 

10a and 106 ^ »^ hquid crystals, light may be selectively controlled so as 

The mask 8 is in one embodiment a transparent sup- ^^"^ contact selected re^ons of the sub- 

port material selectively coated with a layer of opaque strate Alternatively, syn^ may take place on the 

material Portions of the opaque material are removed, end of a senes of optical fibers to which hght is selec- 

leaving opaque material in the precise pattern desired timely apphed. Other means of controUmg the location 

on the substrate surface. The mask is brought into close 50 of ^S^^ exposure will be apparent to those of skill m the 

proxizziity with, imaged on, or brought direcdy into ^ . - 

contact with the substrate surface as shown in FIG. 1. ^^e substrate may be irradiated either in contact or 

"Openings" in the mask correspond to locations on the not in contact with a solution (not shown) and is, prefer- 

substrate where it is desired to remove photoremovable ably, irradiated in contact with a solution. The solution 

protective groups from the substrate. Alignment may be 55 contains reagents to prevent the by-products formed by 

performed using conventional alignment techniques in irradiation from interfering with synthesis of the poly- 

which alignment marks (not shown) are used to accu- mer according to some embodiments. Such by-products 

rately overlay successive masks with previous pattern- might include, for example, carbon dioxide, nitrosocar- 

ing steps, or more sophisticated techniques may be used. bonyl compounds, styrerne derivatives, indole deriva- 

For example, interfcrometric techniques such as the one 60 tives, and products of their photochemical reactions, 

described in Flanders et al., "A New Interferometric Alternatively, the solution may contain reagents used to 

Alignment Technique," App, Phys. Lett. (1977) match the index of refraction of the substrate. Reagents 

31:426-428, which is incorporated herein by reference, added to the solution may further include, for example, 

may be used. acidic or basic buffers, thiols, substituted hydrazines and 

To enhance contrast of light applied to the substrate, 65 hydroxylamines, reducing agents (e.g., NADH) or rea- 

it is desirable to provide contrast enhancement materials gents known to react with a given functional group 

between the mask and the substrate according to some (e.g., aryl nitroso+glyoxylic acid->aryl formhydrox- 

embodiments. This contrast enhancement layer may amate-HCOa). 
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Either concurrently with or after the irradiation step, According to some embodiments, several sequences 

the linker molecules are washed or otherwise contacted are intentionally provided within a single region so as to 

with a first monomer, illustrated by "A" in regions Ha provide an initial screening for biological activity, after 

and in FIG. 2. The first monomer reacts with the which materials within regions exhibiting significant 
activated fimctional groups of the linkage molecules 5 binding are further evaluated, 

which have been exposed to light. The first monomer, IV. Details of One Embodiment of a Reactor System 

which is preferably an amino acid, is also provided with FIG. 8A schematically illustrates a preferred embodi- 

a photoprotective group. The photoprotective group ment of a reactor system 100 for synthesizing polymers 

on the monomer may be the same as or different than on the prepared substrate in accordance with one aspect 
the protective group used in the linkage molecules, and 10 of the invention. The reactor system includes a body 

may be selected from any of the above-described pro- 102 with a cavity 104 on a surface thereof. In preferred 

tective groups. In one embodiment, the protective embodiments the cavity 104 is between about 50 and 

groups for the A monomer is selected from the group 1000 fim deep with a depth of about 500 ^m preferred. 

NBOC and NVOC. The bottom of the cavity is preferably provided with 
As shown in FIG. 3, the process of irradiating is 15 an array of ridges 106 which extend hoik into the plane 

thereafter repeated, with a mask repositioned so as to of the Figure and parallel to the plane of the Figure, 

remove linkage protective groups and expose functional The ridges are preferably about 50 to 200 ftm deep and 

groups in regions 14a and 14^ which are illustrated as spaced at about 2 to 3 mm. The purpose of the ridges is 

being regions which were protected in the previous to generate turbulent flow for better mixing. The bot- 
masking step. As an alternative to repositioning of the 20 tom surface of the cavity is preferably light absorbing so 

first mask, in many embodiments a second mask will be as to prevent reflection of impinging light 

utilized. In other alternative embodiments, some steps A substrate 112 is mounted above the cavity 104. The 

may provide for illuminatiag a common region in sue- substrate is provided along its bottom surface 114 with 

cessive steps. As shown in FIG. 3, it may be desirable to a photoremovable protective group such as NVOC 
provide separation between irradiated regions. For ex- 25 with or without an intervening linker molecule. The 

ample, separation of about 1-5 ftm may be appropriate substrate is preferably transparent to a wide spectrum of 

to account for alignment tolerances. light, but in some embodiments is transparent only at a 

As shown in FIG. 4, the substrate is then exposed to wavelength at which the protective group may be re- 

a second protected monomer "B," producing B regions moved (such as UV in the case of NVOC). The sub- 
16a and 16ir. Thereafter, the substrate is again masked so 30 strate in some embodiments is a conventional micro- 

as to remove the protective groups and expose reactive scope glass slide or cover slip. The substrate is prefera- 

groups on A region 12a and B region 16b, The substrate bly as thin as possible, while still providing adequate 

is again exposed to monomer B, resulting m the forma- physical support. Preferably, the substrate is less than 

tion of the structure shown in FIG. 6, The dimers B-A about 1 mm thick, more preferably less than 0.5 mm 

and B-B have been produced on the substrate. 35 thick, more preferably less than O.I mm thick, and most 

A subsequent series of masking and contacting steps preferably less than 0.05 mm thick. In alternative pre- 

similar to those described above with A (not shown) ferred embodiments, the substrate is quartz or silicon, 

provides the structure shown in FIG. 7. The process The substrate and the body serve to seal the cavity 

provides all possible dimers of B and A, Le., B-A, A-B, except for an inlet port 108 and an outlet port 110. The 

A-A, and B-B. 40 body and the substrate may be mated for scaling in some 

The substrate, the area of synthesis, and the area for embodiments with one or more gaskets. According to a 

synthesis of each individual polymer could be of any preferred embodiment, the body is provided with two 

size or shape. For example, squares, ellipsoids, rectan- concentric gaskets and the intervening space is held at 

gles, triangles, circles, or portions thereof, along with vacuum to ensure mating of the substrate to the gaskets, 

irregular geometric shapes, may be utilized. Duplicate 45 Fluid is pumped through the inlet port into the cavity 

synthesis areas may also be applied to a single substrate by way of a pump 116 which may be, for example, a 

for purposes of redundancy. model no. 6-120^ made by Eldex Laboratories. Sc- 

In one embodiment the regions 12a, 12^ and 16a, 16^ lected fluids are circulated into the cavity by the pump, 

on the substrate will have a surface area of between through the cavity, and out the outlet for recirculation 

about 1 cm2 and 10-*** cm^. In some embodiments the 50 or disposal The reactor may be subjected to ultrasonic 

regions 12a, 126 and 16a, 16b have areas of less than radiation and/or heated to aid in agitation in some em- 

about 10-1 cm2, io-2cin2, 10-3 cm^, io-*cm2, 10-5 bodiments, 

cm2, 10-6 cm2, 10-7 cm^, 10-8 cm^, or 10- w cm^. In a Above the substrate 112, a lens 120 is provided which 

preferred embodiment, the regions 12a, I2b and 16a, may be, for example, a 2" 100 mm focal length fused 

16b are between about lOx 10 ;im and 500x 500 ^m. 55 silica lens. For the sake of a compact system, a reflective 

In some embodiments a single substrate supports mirror 122 may be provided for directing light from a 

more than about 10 different monomer sequences and light source 124 onto the substrate. Light source 124 

perferably more than about 100 different monomer may be, for example, a Xe(Hg) light source manufac- 

sequences, although in some embodiments more than tured by Oriel and having model no. 66024. A second 

about 103, 10*, 105, 10*^ 107, qj. ^qs different sequences 60 lens 126 may be provided for the purpose of projecting 

are provided on a substrate. Of course, within a region a mask image onto the substrate in combination with 

of the substrate in which a monomer sequence is synthe- -lens 120. This form of lithography is referred to herein 

sized, it is preferred that the monomer sequence be as projection printing. As will be apparent from this 

substantially pure. In some embodiments, regions of the disclosure, proximity printing and the like may also be 

substrate contain polymer sequences which are at least 65 used according to some embodiments, 

about 1%, 5%, 10%. 15%, 20%, 25%, 30%, 35%, 40%, Light from the liglit source is permitted to reach only 

45%. 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97% selected locations on the substrate as a result of mask 

98% or 99% pure. 128. Mask 128 may be, for example, a glass slide having 



< • 
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etched chrome thereon. The mask 128* in one embodi- 
ment is provided with a grid of transparent locations 
and opaque locations. Such masks may be manufactured 
by, for example. Photo Sciences, Inc. Light passes 
freely through the transparent regions of the mask, but 5 
is reflected from or absorbed by other regions. There- 
fore, only selected regions of the substrate are exposed 
to light 

As discussed above, hght valves (LCD's) may be 
used as an alternative to conventional masks to selec- 10 
tively expose regions of the substrate. Alternatively, 
fiber optic faceplates such as those available from 
Schott Glass, Inc, may be used for the purpose of con- 
trast enhancement of the mask or as the sole means of 
restricting the region to which light is apphed. Such 15 
faceplates would be placed directly above or on the 
substrate in the reactor shown in FIG. 8A. In still fur- 
ther embodiments, flys-eye lenses, tapered fiber optic 
faceplates, or the like, may be used for contrast en- 
hancement 20 

In order to provide for illumination of regions smaller 
than a wavelength of light, more elaborate techniques 
may be utilized. For example, according to one pre- 
ferred embodiment, light is directed at the substrate by 
way of molecular microcrystals on the tip of, for exam- 25 
pie, micropipettes. Such devices are disclosed in Lieber- 
man et al., "A Light Source Smaller Than the Optical 
Wavelength," Science (1990) 247:59-61, which is incor- 
porated herein by reference for all purposes. 

In operation, the substrate is placed on the cavity and 30 
sealed thereto. All operations in the process of prepar- 
ing the substrate are carried out in a room Ht primarily 
or entirely by light of a wavelength outside of the light 
range at which the protective group is removed. For 
example, m the case of NVOC, the room should be lit 35 
with a conventional dark room light which provides 
little or no UV light. All operations are preferably con- 
ducted at about room temperature. 

A first, deprotection fluid (without a monomer) is 
circulated through the cavity. The solution preferably is 40 
of 5 mM sulfuric acid in dioxane solution which serves 
to keep exposed amino groups protonated and decreases 
their reactivity with photolysis by-products. Absorp- 
tive materials such as N,N-diethylainino 2,4-dinitroben- 
zene, for example, may be included in the deprotection 45 
fluid which serves to ai)sorb light and prevent reflection 
and unwanted photolysis. 

The slide is, thereafter, positioned in a hght raypath 
from the mask such that first locations on the substrate 
are illuminated and, therefore, deprotected. In pre- 50 
ferred embodiments the substrate is illuminated for be- 
tween about 1 and 15 minutes with a preferred illumina- 
tion time of about 10 minutes at 10-20 mW/cm^ with 
365 nm light The sUdes are neutralized Q.e., brought to 
a pH of about 7) after photolysis with, for example, a 55 
solution of di-isopropylethylamine (DIEA) in methy- 
lene chloride for about 5 minutes. 

The first monomer is then placed at the first locations 
on the substrate. After irradiation, the slide is removed, 
treated in bulk, and then reinstalled in the flow celL 60 
Alternatively, a fluid containing the first monomer, 
preferably also protected by a protective group, is cir- 
culated through the cavity by way of pump 116. If, for 
example, it is desired to attach the amino add Y to the 
substrate at the first locations, the amino acid Y (bearing 65 
a protective group on its a-nitrogen), along with rea- 
gents used to render the monomer reactive, and/or a 
carrier, is circulated from a storage container 118, 
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through the pump, through the cavity, and back to the 
inlet of the pump. 

The monomer carrier solution is, in a preferred em- 
bodiment, formed by mixing of a first solution (referred 
to herein as solution "A") and a second solution (re- 
ferred to herein as solution "B"). Table 2 provides an . 
illustration of a mixture which may be used for solution 
A. 

TABLE 2 

Representative Monomer Carrier Solution '*A" 

100 mg NVOC amino protected amino acid 
37 mg HOST (l-Hydroxybeozotriazolc) 
250 ^ DMF (Dimethylformanude) 
86 fit DI£A (Diisopropylethylamine) 



The composition of solution B is illustrated in Table 
3. Solutions A and B are mixed and allowed to react at 
room temperature for about 8 minutes, then diluted 
with 2 ml of DMF, and 500 ^1 are applied to the siirface 
of the slide or the solution is circulated through the 
reactor system and allowed to react for about 2 hours at 
room temperature. The slide is then washed with DMF, 
methylene chloride and ethanoL 

TABLE 3 

Representative Monomer Carrier Solution "B" 

250 >U DMF 

111 mg BOP (Benzotriazolyl-n-oxy-tris(dimetfaylamino) 
phosphoninmhryafl uorophosphate) 



As the solution containing the monomer to be at- 
tached is circulated through the cavity, the amino acid 
or other monomer will react at its carboxy terminus 
with amino groups on the regions of the substrate which 
have been deprotected. Of coune, while the invention 
is illustrated by way of circulation of the monomer 
through the cavity, die invention could be practiced by 
way of removing the slide from the reactor and sub- 
mersmg it in an appropriate monomer solution. 

After addition of the first monomer, the solution 
containing the first amino acid is then purged from the 
system. After circulation of a sufficient amount of the 
DMF/methylene chloride such that removal of the 
amino acid can be assured (e.g., about SOX times the 
volume of the cavity and carrier lines), the mask or 
substrate is repositioned, or a new mask is utilized such 
that second regions on the substrate will be exposed to 
light and the light 124 is engaged for a second ex;>osure. 
This will deprotect second regions on the substrate and 
the process is repeated until the desired polymer se- 
quences have been synthesized. 

The entire derivatized substrate is then exposed to a 
receptor of interest, preferably labeled with, for exam- 
ple, a fluorescent marker, by circulation of a solution or 
suspension of the receptor through the cavity or by 
contacting the surface of the slide in bulk. The receptor 
will preferentially bind to certain regions of the sub- 
strate which contain complementary sequences. 

Antibodies are typically suspended in what is com- 
monly referred to as "supercocktail," which may be, for 
example, a solution of about 1% BSA (bovine serum 
albumin), 0.5% Twecn TM non-ionic detergent in PBS 
(phosphate buffered saline) buffer. The antibodies are 
diluted mto the supercocktail buffer to a final concen- 
tration of, for example, about 0.1 to 4 ^g/ml. 

FIG. 8B illustrates an alternative preferred embodi- 
ment of the reactor shown in FIG. 8A. According to 
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this embodimeat, the mask 128 is placed directly in pled; followed by a third mask, for the C column; and a 
contact with the substrate. Preferably, the etched por- final mask that exposes the right-most column, for D. 
tion of the mask is placed face down so as to reduce the The first, second, third, and fourth masks may be a 
effects of light dispersion. According to this embodi- single mask translated to different locations, 
ment, the imaging lenses 120 and 126 are not necessary 5 The process is repeated in the horizontal direction for 
because the mask is brought into close proximity with the second unit of the dimer. This time, the masks allow 
the substrate. exposure of horizontal rows, again 0.25 cm wide. A, B, 

For purposes of increasing the signal-to-noise ratio of C, and D are sequentially coupled using masks that 
the technique, some embodiments of the invention pro- expose horizontal fourths of the reaction area. The 
vide for exposure of the substrate to a first labeled or 10 resulting substrate contains all 16 dinucleotides of four 
unlabeled receptor followed by exposure of a labeled, bases. 

second receptor (e.g., an antibody) which binds at mul- The eight masks used to synthesize the dinucleotide 
tiple sites on the first receptor. If, for example, the first are related to one another by translation or rotation. In 
receptor is an antibody derived from a first species of an fact, one mask can be used in all eight steps if it is suit- 
animal, the second receptor is an antibody derived from IS ably rotated and translated. For example, in the example 
a second species directed to epitopes associated with the above, a mask with a single transparent region could be 
first species. In the case of a mouse antibody, for exam- sequentially used to expose each of the vertical col- 
pie, fluorescentiy labeled goat antibody or antiserum umns, translated 90*, and then sequentially used to 
which is antimouse may be used to bind at multiple sites allow exposure of the horizontal rows, 
on the mouse antibody, providing several times the 20 Tables 4 and 5 provide a simple computer program in 
fluorescence compared to the attachment of a single Quick Basic for planning a masking program and a 
mouse antibody at each binding site. This process may sample output, respectively, for the synthesis of a poly- 
be repeated again with additional antibodies (e.g., goat- mer chain of three monomers ("residues") having three 
mouse-goat, etc.) for further signal amplification. different monomers in the first level, foiir different mon- 

In preferred embodiments an ordered sequence of 25 omers in the second level, and five different monomers 
masks is utilized. In some embodiments it is possible to in the third level in a striped pattern. The output of the 
use as few as a single mask to synthesize all of the possi- program is the number of celk, the number of "stripes" 
ble polymers of a given monomer set (light regions) on each mask, and the amount of transla- 

If, for example, it is desired to synthesize all 16 dinu- tion required for each exposure of the mask. 

TABLE 4 



Mask Strategy Prograin 



DEFINTA-Z 

DIM b(20), wOOX 1(500) 

FS = "LPTir 

OPEN a FOR OUTPUT AS #1 
jmax " 3 •Number of residues 

b(l) » 3: b(2) = 4: b(3) - 5 'Number of building blocks for res 1,2,3 
g t= h Imax(l) s 1 

FOR j = 1 TO jmax: g= g • bO*): NEXT j 
w(0) = 0:w(l) = g/b(l) 

PRINT #1. "MASK2.BAS DATES, TIMES: PRINT # J. 
PRINT #U USING -Number of residues jmax 
FOR j = 1 TO jmax 

PRINT #U USING Residue ## ## building blocks"; j; bO") 

NEXTj 

PRINT #U - 

PRINT #1, USING "Number of ccUs=####"; g: PRINT #1. 

FOR j = 2 TO jmax 

ImaxO) » ImaxO - I) • b(i - 1) 

wa) = w(i- i)/bO) 

NEXTj 

FOR j = 1 TO jmax 

PRINT »U USING "Mask for residue j: PRINT jS^l, 
PRINT # I, USING- Number of stripcs=###-; ImaxO) . . 
PRINT # FUSING- Width of each ftripc=###";wO') 
FOR 1 = 1 TO ImaxO 
a = I + (1 - 1) • wO - I) 
ac =3 a + wO) - 1 

PRINT #1, USING " Stripe 0» begins at k>cation M§ and ends at 1; a; ae 

NEXTl 

PRINT #1, 

PRINT #1. USING" For each of iS^# building blocks, translate mask by 
cdKi)"; bQ); wQ), 

PRINT #1, : PRINT #1, : PRINT #1. 

NEXTj 



@ Copyri^t 199(X Aflymu Reeardi Insdtoie 

cleotides from four bases, a 1 cm square synthesis regioa 

is divided conceptualiy into 16 boxes, each 0.25 cm TABLE 5 

wide. Denote the four monomer units by A, B, C. and Masking Strategy O^^T 

D. The first reactions are carried out in four vertical 65 . 
columns, each 0.25 cm wide. The first mask exposes the ^""^jf""" '3 „ock, 

left-most column of boxes, where A is coupled The Residue 2 4 banding blocks 

second mask exposes the next column, where B is cou- Rendue 3 s bmidmg blocks 
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Maskiag Strategy Output 



Number of cells = 60 

Mask for residue I 

Number, of stripes = 1 

Width of each stripe = 20 

Stripe 1 begins at location 1 and ends at 20 

For each of 3 building blocks, translate mask by 20 cell{s) 

Mask for residue 2 

Number of stripes= 3 

Width of each stripe = 5 

Stripe I begins at location 1 and ends at S 

Stripe 2 begins at location 21 and ends at 25 

Stripe 3 begins at location 41 and ends at 45 

For each of 4 building blocks, translate mask by 5 cell(s) 

Mask for residue 3 

Number of stripes = 12 

Width of each stripes 1 

Stripe 1 begins at locatioa 1 and ends at 1 

Stripe 2 begins at location 6 and ends at 6 

Stripe 3 begins at location 1 1 and ends at 1 1 

Stripe 4 begins at location 16 and ends at 16 

Stripe 5 begins at location 21 and ends at 21 

Stripe 6 begins at location 26 and ends at 26 

Stripe 7 begins at locatioa 31 and ends at 31 

Stripe 8 begins at location 36 and ends at 36 

Stripe 9 begins at location 41 and ends at 41 

Stripe 10 begins at location 46 and ends at 46 

Stripe 11 begins at locatioa 51 and ends at 51 

Stripe 12 begins at location 56 and ends at 56 

For each of S building blocks, translate mask by 1 cell(s) 

© Copyright 1990, A^ymax Reaearcb Instinite 



aperture plate 211 may be, for example, a model no. 
477352/477380 manufactured by Carl Zeiss. 

The fluoresced light then enters a photomultiplier 
tube 212 which in some embodiments is a model no. 

5 R943-02 manufactured by Hamamatsu, the signal is 
amplified in preamplifier 214 and photons are counted 
by photon counter 216. The number of photons is re- 
corded as a function of the location in the computer 204. 
Pre-Amp 214 may be, for example, a model no. SR440 

10 manufactured by Stanford Research Systems and pho- 
ton counter 216 may be a model no. SR400 manufac- 
tured by Stanford Research Systems. The substrate is 
then moved to a subsequent location and the process is 
repeated. In preferred embodiments the data are ac- 

15 quired every 1 to 100 ptm with a data collection diame- 
ter of about 0.8 to 10 }im preferred. In embodiments 
with sufficiently high fluorescence, a CCD (change 
coupled device) detector with broadfleld illumination is 
utilized 

20 By counting the number of photons generated in a 
given area in response to the laser, it is possible to deter- 
mine where fluorescent marked molecules are located 
on the substrate. Consequently, for a slide which has a 
matrix of polypeptides, for example, synthesized on the 
25 surface thereof, it is possible to determine which of the 
polypeptides is complementary to a fluorescently 
marked receptor. , 

According to preferred embodiments, the intensity 
and duration of the light applied to the substrate is con- 
30 trolled by varying the laser power and scan stage rate 
for improved signal-to-noise ratio by maximizing fluo- 
rescence emission and minimizing background noise. 

While the detection appaiatus has been illustrated 
primarily herein with regard to the detection of marked 



V. Details of One Embodiment of A Fluorescent De- 
tection Device 

FIG.. 9 illustrates a fluorescent, detection device for 
detecting fluorescently labeled receptors on a substrate. 

A substrate 112 is placed on an x/y translation table 202. 

In a preferred embodiment the x/y translation table is a 35 receptors, the invention^ wiU And application in other 

model no. PM500-A1 manufactured by Newport Cor- p^j. example, the detection apparatus disclosed 

poration. The x/y translation ubie is connected to and herein could be used in the fields of catalysis, DNA or 

controlled by an appropriately programmed digital protein gel scanning, and the like, 

computer 204 which may be, for example, an appropri- yi. Determination of Relative Binding Strength of 

ately programmed IBM PC/AT or AT compatible 40 Receptors 

computer. Of course, other computer systems, special jhe signal-to-noise ratio of the present invention is 

purpose hardware, or the like could readily be substi- sufficiently high that not only can the presence or ab- 

tuted for the AT computer used herein for illustration. sence of a receptor on a ligand be detected, but also the 

Computer software for the translation and data collec- relative binding affinity of receptors to a variety of 

tion functions described, herein can be provided based 45 sequences can be determined. 

on commercially available software including, for ex- jn practice it is found that a receptor will bind to 
ample, "Lab Windows" licensed by National Instru- several peptide sequences in an array, but will bind 
mcnts, which is incorporated herein by reference for all much more strongly to some sequences than others, 
purposes. Strong binding affinity wiU be evidenced 'herein by a 
. The substrate and x/y translation table are placed 50 strong fluorescent or radiographic signal since many 
under a microscope 206 which includes one or more receptor molecules will bind in a region of a strongly 
objectives 208. Light (about 488 nm) from a laser 210, bound ligand. Conversely, a weak binding afBnity will 
which in some embodiments is a model no. 2020^5 be evidenced by a weak fluorescent or radiographic 
argon ion laser manufactured by Spectraphysics, is di- signal due to the relatively small number of receptor 
rected at the substrate by a dichroic mirror 207 which 55 molecules which bind in a particular region of a sub- 
passes greater than about 520 nm light but reflects 488 strate having a ligand with a weak binding affinity for 
nm light Dichroic mirror 207 may be, for example, a the receptor. Consequently, it becomes possible to de- 
model no. FT510 manufactured by Carl Zeiss. Light termine relative binding avidity (or affinity in the case 
reflected from the mirror then enters the microscope of univalent interactions) of a ligand herein by way of 
206 which may be, for example, a model no. Axioscop 60 the intensity of a fluorescent or radiographic signal in a 
20 manufactured by Carl Zeiss. Fluorescein-marked region containing that ligand. 

materials on the substrate wiU fluoresce >488 nm light. Semiquantitative data on affinities might also be ob- 

and the fluoresced light will be collected by the micro- tained by varying washing conditions and concentra- 

scope and passed through the mirror. The fluorescent tions of the receptor. This would be done by compaii- 

light from the substrate is then directed through a wave- 65 son to known ligand receptor pairs, for example, 

length filter 209 and, thereafter through an aperture VIL Examples 

plate 211. Wavelength filter 209 may be, for example, a The foDowing examples are provided to illustrate the 

model no. OG530 manufactured by Melles Griot and efficacy of the inventions herein. AH operations were 
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conducted at about ambient temperatures and pressures 
unless indicated to the contrary. 

A. Slide Preparation 

Before attachment of reactive groups it is preferred to 
clean the substrate which is, m a preferred embodiment 5 
a glass substrate such as. a microscope slide or cover 
slip. According to one embodiment the slide is soaked in 
an alkaline bath consisting of, for example, 1 liter of 
95% ethanol with 120 ml of water and 120 grams of 
sodium hydroxide for 12 hours. The slides are then 10 
washed under running water and allowed to air dry, 
and rinsed once with a solution of 95% ethanol. 

The slides are then aminated with, for example, 
aminopropyltriethoxysilane for the purpose of attach- 
ing axnino groups to the glass surface on linker mole- 15 
cules, although any omega functionalized silane could 
also be used for this purpose. In one embodiment 0.1% 
aminopropyltriethoxysilane is utilized, although solu- 
tions with concentrations from 10-^% to 10% may be 
used, with about 10-^% to 2% preferred. A 0.1% mix- 20 
ture is prepared by adding to 100 ml of a 95% 
ethanol/5% water mixture, 100 microliters (jil) of 
aminopropyltriethoxysilane. The mixture is agitated at 
about ambient temperature on a rotary shaker for about 
5 minutes. 500 fil of this mixture is then applied to the 25 
surface of one side of each cleaned slide. After 4 min- 
utes, the slides are decanted of this solution and rinsed 
three times by dipping in, for example, 100% ethanol. 

After the plates dry, they are placed in a 1 10*-120- C 
vacuum oven for about 20 minutes, and then allowed to 30 
cure at room temperature for about 12 hours in an argon 
environment. The slides are then dipped into DMF 
(dimethylformamide) solution, followed by a thorough 
washing with methylene chloride. 

The aminated surface of the slide is then exposed to 35 
about 500 ^1 of, for example, a 30 mUlimolar (mM) 
solution of NVOC-GABA (gamma amino butyric acid) 
NHS (N-hydroxysuccinimide) in DMF for attachment 
of a NVOC-GABA to each of the amino groups. 

The surface is washed with, for example, DMF, 40 
methylene chloride, and ethanol. 

Any unreacted aminopropyl silane on the surface — 
that is, those amino groups which have not had the 
NVOC-GABA attached— are now capped with acetyl 
groups (to prevent further reaction) by exposure to a 1:3 45 
mixture of acetic anhydride in pyridine for 1 hour. 
Other materials which may perform this residual cap- 
ping function include trifluoroacetic anhydride, for- 
micacetic anhydride, or other reacdve acylating agents. 
Finally, the slides are washed again with DMF, methy- 50 
lene chloride, and ethanol. 

B. Synthesis of Eight Trimers of "A" and "B" 
FIG. 10 illustrates a possible synthesis of the eight 

trimers of the two-monomer set: gly, phe (represented 
by "A" and "B," respectively). A glass slide bearing 55 
sflane groups terminating in 6-nitroveratryloxycarboxa- 
mide (NVOC-NH) residues is prepared as a substrate. 
Active esters (pentafluorophenyl, OBt, etc.) of gly and 
phe protected at the amino group with NVOC are pre- 
pared as reagents. While not pertinent to this example, if 60 
side chain protecting groups are required for the mono- 
mer set, these must not be photoreactive at the wave- 
length of light used to protect the primary chain. 

For a monomer set of size n, nXl cycles are required 
to synthesize all possible sequences of length I. A cycle 65 
consists of: 

1. Irradiation through an appropriate mask to expose 
the amino groups at the sites where the next residue 



934 

24 

is to be added, with appropriate washes to remove 
the by-products of the deprotcction. 
2. Addition of a single activated and protected (with 
the same photochemically-removable group) mon- 
omer, which will react only at the sites addressed 
in step 1, with appropriate washes to remove the 
. excess reagent from the surface. 
The above cycle is repeated for each member of the 
monomer set until each location on the surface has been 
extended by one residue in one embodiment In other 
embodiments, several residues are sequentially added at 
one location before moving on to the next location. 
Cycle times will generally be limited by the coupling 
reaction rate, now as short as 20 min in automated pep- 
tide synthesizers. This step is optionally followed by 
addition of a protecting group to stabilize the array for 
later testing. For some types of polymers (e.g., pep- 
tides), a fmal deprotection of the entire surface (removal 
of photoprotective side chain groups) may be required. 

More particularly, as shown in FIG. lOA, the glass 20 
is provided with regions 22, 24, 26, 28, 30. 32, 34, and 
36. Regions 30, 32, 34, and 36 are masked, as shown in 
FIG. lOB and the glass is irradiated and exposed to a 
reagent containg "A" (e.g., gly), with the resulting 
structure shown in FIG. IOC. Thereafter, regions 22, 
24, 26, and 28 are masked, the glass is irradiated (as 
shown in FIG. lOD) and exposed to a reagent contain- 
ing "B" (e.g., phe), with the resulting structure shown 
in FIG. lOE. The process proceeds, consecutively 
masking and exposing the sections as shown until the 
structure shown in FIG. lOM is obtained. The glass is 
irradiated and the terminal groups are, optionally, 
capped by acetylation. As shown, all possible trimers of 
gly/phe are obtained. 

In this example, no side chain protective group re- 
moval is necessary. If it is desired, side cham deprotec- 
tion may be accomplished by treatment with ethanedi- 
thiol and trifluoroacetic acid. 

In general, the number of steps needed to obtain a 
particular polymer chain is defined by: 

nxl (1) 

where: 

11= the number of monomers in the basis set of mono- 
mers, and 

l=the number of monomer units in a polymer chain. 
Conversely, the synthesized number of sequences of 
length I will be: . 

n'. C2) 

Of course, greater diversity is obtained by using 
masking strategies which will ^so include the synthesis 
of polymers having a length of less than 1. If, in the 
extreme case, all polymers having a length less than or 
equal to 1 are synthesized, the number of polymers syn- 
thesized win be: 

f/W-i+.-.+nJ. (3) 

The mAriTniiTn number of lithographic steps needed 
will generally be n for each **layer" of monomers, Le., 
the total number of masks (and, therefore, the number 
of lithographic steps) needed will be nXL The size of 
the transparent majsV regions wiU vary in accordance 
with the area of the substrate available for synthesis and 
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the number of sequences to be formed. In general, the pregnated with a known number of fluorescein mole- 
size of the synthesis areas will be: cules. 

One of the beads was placed in the illimiination field 
riM of synthesis arcas=(AV(Scquences) (jjg s^^n Stage as shown in FIG. 9 in a field of a laser 

5 spot which was initially shuttered. After being posi- 

where: , , \. , . . tioned in the illumination field, the photon detection 

A is the total area available for synthesis; and equipment was turned on. The laser beam was un- 

Sequences is the number of sequences desired m the ^^^^^ interacted with the particle bead, which 

area. . * j ^ r t,-ii * «w then fluoresced. Fluorescence curves of beads impreg- 

neously produce thousands or millions of oligomers on ^hown m FIGS. IIA and IIB respectively. On each 
a substrate using the photoUthographic techniques dis- ^^^^ ^ ^^^^^ fluorescem mo^ecul^ 

closed herein. ConsequenUy, the method results in the ^.\^<> ^^ov,^' P^^^"^^^ 
abUity to practically test large numbers of, for example, . . 488 nm excitation, with 100 of laser power, 

di. tri, tetra, penta, hexa, hepta, octapeptides. dodeca- The light was focused through a 40 power 0.75 NA 

peptides, or larger jwlypeptides (or correspondingly, objective. 

polynucleotides). The fluorescence intensity in all cases started off at a 
The above example has illustrated the method by way lugh value and then decreased exponentially. The fall- 
of a manual example. It will of course be appreciated off in intensity is due to photobleaching of the fluores- 
that automated or semi-automated methods could be cein molecules. The traces of beads without fluorescein 
used. The substrate would be mounted in a flow cell for molecules are used for background subtraction. The 
automated addition and removal of reagents, to mini- difference in the initial exponential decay between la- 
mize the volume of reagents needed, and to more care- beled and nonlabeled beads is integrated to give the 
fully control reaction conditions. Successive masks total number of photon counts, and this number is re- 
could be applied manually or automatically. lated to the number of molecules per bead. Therefore, it 
Synthesis of a Dimer of an Aminopropyl Group and is possible to deduce the number of photons per fluores- 
a Fluorescent Group cein molecule that can be detected. For the curves 
In synthesizing the dimer of an aminopropyl group illustrated in FIG. 11 A and IIB, this calculation indi- 
and a fluorescent group, a functionalized durapore cates the radiation of about 40 to 50 photons per fluores- 
membrane was used as a substrate. The durapore mem- cein molecule are detected, 
brane was a polyvinylidine difluoride with aminopropyl £. Determination of the Number of 
groups. The aminopropyl groups were protected with Molecules Per Unit Area 

the DDZ group by reaction of the carbonyl chloride Aminopropylated glass microscope sUdes prepared 

with the amino groups, a reaction readily known to according to the methods discussed above were utilized 

those of skill in the art The surface bearing these in order to esublish the density of labeling of the sHdes. 

groups was placed in a solution of THF and contacted ^he free amino termini of the slides were reacted with 

with a mask bearing a checkerboard pattern of 1 mm pj^Q (fluorescein isothiocyanate) which forms a cova- 
opaque and transparent regions. The mask was exposed ^^^^ is then 

40 scanned to count the number of fluorescent photons 

about 280 nm for about 5 nunutes at ambient tempera- « generated in a region which, using the estimated 40-50 

ture, although a wide range of exposure tunes and tem- ^^^^^ fluorescent molecule, enables the calcula- 
peratures may be appropriate in vanous fmbodmients ^^^^ molecules which are on the sur- 

of the mvention. For example, m one embodiment, an 

«pos«emcofbetwcenal»utland5000se^^^ ^^^^ aminopropyl sUane on its surface was 

be^ at process temperatures of between -70 and im 1 mM wlution of FTTC in DMF for 1 

T- ti^^ «f hour at about ambient temperature. After reaction, the 

tu,in .L^„M ^ ^^^^^»f^T.^Zt ir^ slide was washed twice y^ih DMF and then washed 
tween about 1 and 500 seconds at about ambient pres- . . i ^ j ..u i • t. — *u- 

sure are used. In some preferred embodiments. preLre IT'^,^^"^' "^T-' !^ ^^^^V" 

above ambient is used to prevent evaporation. 50 dned and stored m the dark mitil it was r^dy to be 

The surface of the membrane was then washed for ^^^^^^ ^ \ . t. t. 

about 1 hour with a fluorescent label which included an J^^^f" ^/ '° ^^"^ ™ 
active ester bound to a chelate of a lanthanide. Wash and IIB. and by mtegratmg the fluorescent 

times will vary over a wide range of values from about counts under the exponentially decaying si^, the 

a few minutes to a few hours. These materials fluoresce 55 number of free ammo groups on the surface after den- 

in the red and the green visible region. After the reac- vatization was determined. It was determined that slides 

tion with the active ester in the fluorophore was com- with labeling densities of 1 fluorescem per lO^x 10^ to 

plete, the locations m which the fluorophore was bound - 2 X 2 nm could be reproducibly made as the concen- 

could be visualized by exposing them to ultraviolet light tration of aminopropyltriethoxysilane varied from 

and observing the red and the green fluorescence. It 60 10 ^% to 10 Wo, 

was observed that the dcrivatized regions of the sub- Removal of NVOC and Attachment of A Ruores- 

strate closely corresponded to the original pattern of cent Marker 

the mask. NVOC-GABA groups were attached as described 

D. Demonstration of Signal Capability above. The entire surface of one slide was exposed to 

Signal detection capability was demonstrated using a 65 light so as to expose a free amino group at the end of the 

low-level standard fluorescent bead kit manufactured gamma amino butyric acid. This slide, and a duplicate 

by Row Cytometry Standards and having model no. which was not exposed, were then exposed to fluores- 

824. This kit includes 5.8 ptm diameter beads, each im- cein isothiocyanate (FTTQ. 
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FIG. 12A illustrates the slide which was not exposed 
to light, but which was exposed to FTTC. The units of 
the X axis are time and the units of the y axis are counts. 
The trace contains a certain amount of background 
fluorescence. The duplicate slide was exposed to 350 
mn broadband illumination for about 1 minute (12 
mW/cm^, .-350 nm illuminatibn), washed and reacted 
with FTTC. The fluorescence curves for this slide are 
shown in FIG. 12B. A large increase in the level of 
fluorescence is observed, which indicates photolysis has 
exposed a number of amino groups on the surface of the 
slides for attachment of a fluorescent marker. 
G. Use of a Mask in Removal of NVOC 
The next experiment was performed with a 0.1% 
aminopropylated slide. Light from a Hg — Xe arc lamp 
was imaged onto the substrate through a laser-ablated 
chrome-pn-glass mask in direct contact with the sub- 
strate. 

This slide was illuminated for approximately 5 min- 
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Monomer-by-monomer synthesis of YGGFL and 
GGFL in alternate squares was performed on a slide in 
a checkerboard pattern and the resulting slide was ex- 
posed to the Herz antibody. This experiment and the 
results thereof are illustrated in FIGS. 14A, 14B, ISA, 
and 15B. 

In FIG. 14A, a slide is shown which is derivatized 
with the aminopropyl group, protected in this case with 
t-BOC (t-butoxycarbonyl). The slide was treated with 
TFA to remove the t-BOC protecting group. E- 
aminocaproic acid, which was t-BOC protected at its 
amino group, was then coupled onto the aminopropyl 
groups. The aminocaproic acid serves as a spacer be- 
tween the aminopropyl group and the peptide to be 
synthesized. The amino end of the spacer was de- 
protected and coupled to NVOC-leucine. The entire 
slide was then illuminated with 12 mW of 325 nm broad- 
band illumination. The slide was then coupled with 
NVOC-phenylalanine and washed. The entire slide was 



utes. with 12 mW of 350 nm broadband light and then t^J^^^^f^l 
reacted with the 1 mM FITC solution- It was put on the 
laser detection scanning stage and a graph was plotted 
as a two-dimensional representation of position color- 
coded for fluorescence intensity. The fluorescence in- 
tensity (in counts) as a function of location is given on 
the color scale to the right of FIG. 13A for a mask 
having lOOX 100 }im squares. 

The experiment was repeated a number of times 
through various masks. The fluorescence pattern for a 
50 ptm mask is illustrated in FIG. 13B, for a 20 fim mask 
in FIG. 13C and for a 10 ptm mask in FIG. 13D. The 
mask pattern is distinct down to at least about 10 p.m 
squares using this lithographic technique. 



washed. The slide was again illuminated and coupled to 
NVOC-glycine to form the sequence shown in the last 
portion of FIG. 14A. 
As shown in FIG. 14B, alternating regions of the 
25 slide were then illuminated using a projection print 
using a 500 x 500 fim checkerboard mask; thus, the 
amino group of glycine was exposed only in the lighted 
areas. When the next coupling chemistry step was car- 
ried out, NVOC-tyrosine was added, and it coupled 
30 only at those spots which had received illumination. 
The entire slide was then illuminated to remove all the 
NVbC groups, leaving a checkerboard of YGGFL in 
the lighted areas and in the other areas, GGFL. The 
Herz antibody (which recognizes the YGGFL, but not 
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H. AttachmentofYGGFLandSubsequentExposure 3, GSK^-^'theradSeZ^oliT^^^ 

TT A ^. J ^ ^ . . fluorescein conjugate. 

Herz Antibody and Goat Antmiouse ^^^^ fluorescence scan is shown in FIG. 

In order to estabhsh that receptors to a particular ISA, and the color coding for the fluorescence intensity 
polypeptide sequence would bmd to a surface-bound ^ rfght Dark areas contain the tetra- 

peptide and be detected. Leu enkephahn w^^ 40 peptide GGFL, which b not recognized by the Herz 

tiie surface and r«^gmzed by an antibody. A shde was andbody (and thus there is no binding of the goat anti- 
denvatued with a 1% ammo propyl and antibody with fluorescein conjugate), and in the 

protected with NVOC, A 500 ftm checkerboard mask ^ed areas YGGFL is present The YGGFL pentapep- 
was used to exposeAe slide in a flow cell using backside tide is recognized by the Herz antibody and, therefore, 
contact pnntmg. The Leu enkephalin sequence (H2N- 45 
tyrosine,glycine,glycine,phenylalanine4eucine-C02H, 



otherwise referred to herein as YGGFL) was attached 
via its carboxy end to the exposed amino groups on the 
surface of the slide. The peptide was added in DMF 
solution with the BOP/HOBT/DIEA coupling rea- 
gents and recirculated through the flow cell for 2 hours 
at room temperature. 

A first antibody, known as the Herz antibody, was 
applied to the surface of the slide for 45 minutes at 2 
/ig/ml in a supercocktail (containing \% BSA and 1% 
ovalbumin also in this case). A second antibody, goat 
anti-mouse fluorescein conjugate, was then added at 2 
pig/mJ in the supercocktail buffer, and allowed to incu- 
bate for 2 hours. An image taken at 10 pim steps indi- 
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there is antibody in the lighted regions for the fluore- 
scein-conjugated goat anti-mouse to recognize. 

Similar patterns are shown for a 50 y,m mask used in 
direct contact ("proximity print") with the substrate in 
FIG. 15B. Note that the pattern is more distinct and the 
comers of the checkerboard pattern are touching when 
the mask is placed in direct contact with the substrate 
(which reflects the increase in resolution using this 
technique). 

J. Monomer-by-Monomer Synthesis of YGGFL and 
PGGFL 

A synthesis using a 50 p.m checkerboard mask similar 
to that shown in FIG, 15B was conducted. However, P 
was added to the GGFL sites on the substrate through 
an additional coupling step. P was added by exposing 



cated that not only can deprotection be earned out in a 60 protected GGFL to light and subsequent exposure to P 
well defined pattan, but also that (1) the method pro- _ ^ 
vides for successful coupling of peptides to the surface 
of the substrate, (2) the surface of a bound peptide is 
available for binding with an antibody, and (3) that the 
detection apparatus capabiUties are sufficient to detect 65 
binding of a receptor. 

I. Monomer-by-Monomer Formation of YGGFL and 
Subsequent Exposure to Labeled Antibody 



in the manner set forth above. Therefore, half of the 
regions on the substrate contained YGGFL and the 
remaining half contained PGGFL. 

The fluorescence plot for this experiment is provided 
in FIG. 16. As shown, the regions are again readily 
discemable. This experiment demonstrates that antibod- 
ies are able to recognize a specific sequence and that the 
recognition is not length-dependent 
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^K. Monomer-by-Monomer Synthesis of YGGFL and ^^^E 6^ntinued 



In order to further demonstrate the operabOity of the Apparent Bindbff to Herz Ab 

invention, a 50 checkerboard pattern of alternating 



YGGFL and YPGGFL was synthesized on a substrate 5 ^GFL 

using techniques Hke those set forth above. The result- . . 

ing fluorescence plot is provided in FIG. 17. Again, it is vni. Illustrative Alternative Embodiment 

seen that the antibody is clearly able to recognize the According to an alternative embodiment of the in- 

YGGFL sequence and does not bind significantly at the vention, the methods provide for attaching to the sur- 

YPGGFL regions. face a caged binding member which in its caged form 

L. Synthesis of an Array of Sixteen Different Amino has a relatively low afTmity for other potentially bind- 

Acid Sequences and Estimation of Relative Binding ing species, such as receptors and specific binding sub- 

AfGnity to Herz Antibody stances. 

Using techniques similar to those set forth above, an According to this alternative embodiment, the inven- 
array of 16 different amino acid sequences (replicated tion provides methods for forming predefined regions 
four times) was synthesized on each of two glass sub- on a surface of a solid support, wherein the predefined 
strates. The sequences were synthesized by attaching regions are capable of immobilizing receptors. The 
the sequence NVOC-GFL across the entire surface of methods make use of caged binding members attached 
the slides. Using a series of masks, two layers of amino ^ to the surface to enable selective activation of the pre- 
acids were then selectively applied to the substrate. defined regions. The caged binding members are liber- 
Each region had dimensions of 0.25 cm X 0.0625 cm. ated to act as binding members ultimately capable of 
The first slide contained amino add sequences contain- binding receptors upon selective activation of the pre- 
ing only L amino acids while the second slide contained defined regions. The activated binding members are 
selected D amino acids. FIGS. 18A and 18B illustrate a 25 immobilize specific molecules such as re- 
map of the various regions on the first and second slides, ceptors on the predefined region of the surface. The 
respectively. The patterns shown in FIGS. 18A and above procedure is repeated at the same or different 
18B were duplicated four times on each slide. The slides sites on the surface so as to provide a surface prepared 
were then exposed to the Herz antibody and fluore- ^ plurality of regions on the surface containing, for 
scein-labeled goat anti-mouse. 30 example, the same or different receptors. When recep- 

FIG. 19 is a fluorescence plot of the first slide, which tors immobilized in this way have a differential afHnity 

contained only L amino acids. Red indicates strong more ligands, screenings and assays for the 

binding (149,000 counts or more) while black indicates ligands can be conducted in the regions of the surface 

Uttle or no binding of the Herz antibody (20,000 counts containing the receptors. 

or less). The bottom right-hand portion of the slide 35 alternative embodiment may make use of novel 

appears "cut ofP' because the sHde was broken during ^^^ed binding members attached to the substrate, 

processing. The sequence YGGFL is clearly most ^^S^ (unactivated) members have a relatively low 

strongly recognized. The sequences YAGFL and affimty for receptors of substances that specifically bind 

YSGFL also exhibit strong recognition of the antibody. uncaged bindmg members when compared with the 

By contrast, most of the remaining sequences show little 40 corresponding afBmties of activated bindmg members, 

or no binding. The four dupUcate portions of the sUde ^Vf ' ^"J^S protected from reaction 

are extremely consistent in the amomit of binding ^td a smtable source of energy is apphe^ 

shown therein smace desired to be activated. Upon apphcation 

FIG. 20 is a fluorescence plot of the second slide. ^ f '^^^^ ^""t'^ ^"^""^ the caging groups labihze. 

Again, stronpt binding is ^^^^^^^ the YGGFL « pr J^^^^ member. A 

Y^n^"\!^"^\ ^^^^ r\ "^T """'""'"^ T Once the Ending members on the surface are acti- 

J^J^f^' YpGFL (where L-amino acids vated they may be Attached to a receptor. The receptor 

are identified by one upp^^ letter abbreviation and ^^^^ ^ ^ monoclonal antibody, a nucleic acid 

D-ammo acids are identified by one lower case letter ^ ^rug receptor, etc. The receptor will usu- 

abbre^^don).Theremam^^ ally, though not always, be prepared so as to permit 

mg with the antibody^ote the low bmdmg efficiency ^^^^^ ^^^^y indirectly, to a bmding member. 

°^rf ,^3^,^^.^ • . • . . ^ For example, a specific binding substance having a 

Table 6 hsts the various sequences tested m order of strong binding affinity for the binding member and a 
relative fluorescence, which provides information re- 55 strong affinity for the receptor or a conjugate of the 
gardmg relative bmdmg affimty. receptor may be used to act as a bridge between binding 

TABLE 6 members and receptors if desired. The method uses a 

receptor prepared such that the receptor retains its 
activity toward a particular ligand. 
60 Preferably, the caged binding member attached to the 
solid substrate ^will be a photoactivatable biotin com- 
plex, Le., a biotin molecule that has been chemically 
modified with photoactivatable protecting groups so 
that it has a significantly reduced binding afOnity for 
65 avidin or avidin analogs than does natural biotin. In a 
preferred embodiment, the protecting groups localized 
in a predefined region of the surface will be removed 
upon application of a suitable source of radiation to give 



Apparent Bindins to Herz Ab 
L-a.a. Set D^.^. Set 


YGGFL 


YGGFL 


YAGFL 


YaGFL 


YSGFL 


YjGFL 


LGGFL 


YpGFL 


FGGFL 


PGGFL 


YPGFL 


yGGFL 


LAGFL 


faGFL 


FAGFL 


wGGFL 


WGGFL • 


yaGFL 




fpGFL 
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binding members, that are biotin or a functionaUy analo- 
gous compound having substantially the same binding 
affinity for avidin or avidin analogs as docs biotin. 

In another preferred embodiment, avidin or an avidin 
analog is incubated with activated binding members on 
the siirface until the avidin binds strongly to the binding 
members. The avidin so immobilized on predefined 
regions of the surface can then be incubated with a 
desired receptor or conjugate of a desired receptor. The 
receptor will preferably be biotinylated, e,g., a bi- 
otinylated antibody, when avidin is immobilized on the 
predefined regions of the surface. Alternatively, a pre- 
ferred embodiment will present an avidin/biotinylated 
receptor complex, which has been previously prepared, 
to activated binding members on the surface. 
IX. Conclusion 

The present inventions provide greatly improved 
methods and apparatus for synthesis of polymers on 
substrates. It is to be understood that the above descrip- 
tion is intended to be illustrative and not restrictive. 
Many embodiments will be apparent to those of skill in 
the art upon reviewing the above description. By way 
of example, the invention has been described primarily 
with reference to the use of photoremovable protective 
groups, but it will be readily recognized by those of skill 
in the art that sources of radiation other than light could 
also be used. For example, in some embodiments it may 
be desirable to use protective groups which are sensi- 
tive to electron beam irradiation, x-ray irradiation, in 
combination with electron beam lithograph, or x-ray 
lithography techniques. Alternatively, the group could 
be removed by exposure to an electric current The 
scope of the invention should, therefore, be determined 
not with reference to the above description, but should 33 
instead be determined with reference to the appended 
claims, along with the full scope of equivalents to which 
such claims are entitled. 
What is claimed is: 

1. A substrate with a surface comprising KP or more 40 
groups of oligonucleotides with diifferent, known se- 
quences covalently attached to the surface in discrete 
known regions, said 1(P or more groups of oligonucleo* 
tides occupying a total area of less than 1 cm^ on said 
substrate, said groups of oligonucleotides having difTer- 45 
ent nucleotide sequences. 

2. The substrate as recited in claim 1 wherein said 
substrate comprises 10* or more different groups of 
oligonucleotide with known sequences covalently cou- 
pled to discrete known regions of said substrate. 50 
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3. The substrate as recited in claim 1 wherein said 
substrate comprises 10^ or more different groups of 
oligonucleotides with known sequences in discrete 
known regions. 

4. The substrate as recited in claim 1 wherein said 
substrate comprises 10* or more different groups of . 
oligonucleotides with known sequences in discrete 
known regions. 

5. The substrate as recited in claim 1 wherein said 
groups of oligonucleotides are at least 50% pure within 
said discrete known regions. 

6. The substrate as recited in claim 1 wherein the 
groups of oligonucleotides are attached to the surface 
by a linker. 

7. An array of more than 1.000 different groups of 
oligonucleotide molecules with known sequences cova- 
lently coupled to a surface of a substrate, said groups of 
oligonucleotide molecules each in discrete known re- 
gions and differing from other groups of oligonucleo- 
tide molecules in monomer sequence, each of said dis- 
crete known regions being an area of less than about 
0.01 cm^ and each discrete known region comprising 
oligonucleotides of known sequence, said different 
groups occupying a total area of less than 1 cm^. 

8. The array as recited in claim 7 wherein said area is 
less than 10,000 microns^. 

9. The array as recited in claim 7 made by the process 
of: 

exposing a first region of said substrate to light to 
remove photoremovable groups from nucleic acids 
m said first region, and not exposing a second re- 
gion of said surface to light; 

covalently coupling a first nucleotide to said nucleic 
adds on said part of said substrate exposed to light, 
said first nucleotide covalently coupled to said 
photoremovable group; 

exposing a part of said first region of said substrate to 
light, and not exposing another part of said first 
region of said substrate to light to remove said 
photoremovable groups; 

covalently coupling a second nucleotide to said part 
of said first region exposed to light; and 

repeating said steps of exposing said substrate to light 
and covalently coupling nucleotides until said 
more than 500 different groups of nucleotides are 
formed on said surface. 

10. The array as recited in claim 7 comprising more 
than 10,000 groups of oligonucleotides of known se- 
quences. 

•*.**« 
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ABSTRACT 



Libraries of unimolecular, doublc-strandcd oligonucleotides 
on a solid support These libraries arc useful in pharmaceu- 
tical discovery for the scrtening of numerous biological 
samples for specific , interactions between the doublc- 
siranded oligonucleotides, and peptides, proteins, drugs and 
RNA. Ir a related aspect, the present invention provides 
libraries of conformationally restricted probes on a solid 
support. The probes arc restricted in their movement and 
nexibiliiy using double- stranded oligonucleotides as scaf- 
folding. The probes arc also useful in various screening 
procedures associated with drug discovery and diagnosis. 
The prcseni invention funbcr provides methods for the 
prcparaLion and screening of the above libraries. 

6 Claims, 1 Drawing Sheet 



A 



77777777777' 



5^56,752 

Pag£2 



OTHER PUBUCATIONS 

Durand, M.. d al, Nudeic Add Res.. 18(21) : 6353-5469 

(1990) . 

Famulot, M., ei al., An^ew. Chan. Im, Ed Engl. 
3l:979-9S8 (1992). 

Chattopadhyaya, R-, ci al., Nature, 334:175-179 (1988). 
Bock, L C. ct al., Naiun, 355:564-566 (1992). 
Parham, Peter, Naiun, 360:300-301 (1992). 
Tucric C, et al. Sdenee, 249J05-5I0 (J990). 
Mcrgny, J.-L., ei al.. Nucleic Adds Res., 19(7) : 1521-1526 

(1991) . 



Brojsalina, E. ct al.. / Am. Ckem. Soc, 115:796-797 
(1993). 

H&ni. T. et aL. Biochemistry. 29:959-965 (1990). 

Cook, J., ci al,. Analytical Biochemistry. 190:331-339 

(1990). 

Cunibcni. C. eial.. Biophysical Chemistry, 38:11-22 (1990). 
Bcrman. H. M., ci al.. Ann. Rev. Biophys. Bioeng.. 
10:87-114 (1981). 

While el al. Trindples of Biochemistry" New York: 
McGriw-Hill, 1978 pp. 124-128. 



r 



U.S. Patent 



Sep. 17, 1996 



5,556,752 



\ \ 

c 



MM 
MM 
MM 



/ 



/ I 



\ 



- CO 



\\ \\ ^ 

'I 




c: 





3 
O) 



V : I 

r tn /w V 



/ 



K 



\ 



C 



I 



I 



u. 



5.556.752 



10 



SURFACE-BOUND. UMMOLECULAR, 
DOUBLE-STRANDED DNA 

GOVERNMENT RIGHTS 

Research leading to (he invention was funded in pan by 
NIH Grani No. R01HG008 13-03 and Ibe govemmcm may 
have certain fighis to the invention. 

BACKGROUND OF THE INVENTION 

The present invention relates to the field of polymer 
synibais and the use of polymer libraries for biological 
screening. More specifically, in one embodimeni the invcn- 
lion provides arrays of diverse double-stranded oligonuclc- 
ottde sequences. In another embodimenu the invention pro- 
vides arrays of conformationally restricted probes, wherein 
the probes are held in position using double-stranded DNA 
sequences as scaffolding. Libiancs of diverse unimolecular 
double-stranded nucleic acid sequences and probes may be ^ 
used, for example, in saccning studies for determination of 
binding affinity exhibited by binding proteins, drags, or 
RNA. 

Methods of synthesizing desired single stranded DNA ^ 
sequences arc wcU icnown to those of skill in the art. In 
panicular. methods of synthesizing oligonucleotides arc 
found in. for example. Oligonucleotide Synthesis: A Frac- 
■ tical Approach, Gait. ed.. IRL Press, Oxford (1984), incor- 
porated herein by reference in its entirely for all purposes. ^ 
Synihesiiing unimolecular double-stranded DNA in soluiioii 
has also been described. Sec. Durand, et ol. Nucleic Acids 
Res. 18:6353-6359 (1990) and Thomson, ei al. Nucleic 
Acids Res. 21:5600-5603 (1993). the disclosures of both 
being incorporated herein by reference. 

Solid phase synthesis of biological polymers has been 
evolving since the early •*MerTi6cld" solid phase peptide 
synthesis, described in Merrificld, X Am. Chem. Soc. 
85:2149-2154 (1963), incorporated herein by reference for 



35 



. In the above-rcfercnced Fodor el aL. PCT application, an 
elegant method is described for using a computer-controlled 
system to direct a VLSIPS™ procedure. Using this 
approach, one heterogenous array of polymers is converted, 
thnnigh simultaneous coupling ai a number of reaction sites, 
into a (fiffffcm heterogenous array. See, U.S. Pat. No. 
5.384.261 and U.S. application Ser. No. 07/980.523. the 
dlselosuits of which arc incorporated herein for all pur- 
poses. 

The development of VLSIPS^ technology as described 
in the above-noted U.S. PaL No. 5.143,854 and PCT paicnt 
publication Nos. WO 9W15070 and 92/10092, is considered 
pioneering technology in the fields of combinatorial synthcr 
sis and screening of combinatorial libraries. More recently, 
patent appUcaiion Set No. 08^,937. filed Jun. 25. 1993 
now abandoned, describes irsthods for making arrays of 
oligonucleotide probes thai can be used to dicck or dcier- 
minc a partial or complete sequence of a target nucleic acid 
and to detect tte presence of a nucleic acid coniainiag a 
specific oligonucleotide sequence. 

A number of biochemical processes of pharmacemical 
interest involve the interaction of some spedcs. e,g-, a drug, 
a peptide or protein, or RNA. with double-stranded DNA. 
For example, proicin/DNA binding imeraciions are involved 
with a number of transcription factors as well as uimor 
suppression associated with the p53 protein and the genes 
contributing to a number of cancer conditions. 

SUMMARY OF THE INVENTION 

High-density anays of diverse unimolecular. double- 
stranded oligonucleotides, as well as arrays of conforma- 
tionally restricted probes and methods for their use are 
provided by virtue of the present inventioa In addition, 
methods and devices for detecting duplex formation of 
oligonucleotides on an array of diverse single-stranded 
oiigonueleotides arc also provided by this invention. Fur- 
ther, an adhesive based on the specific binding characicris- 



oj.*itT-<.ij-f viirw^j, ...^.Kw.«««. ajrays of complementary oligonucleotides is 

all purposes. Solid-phase synthesis techniques have been ^ j^ed in the present inventioa 
provided for the symhesis of several peptide sequences on. ^^^^^ ^^^^ ^^^^ 



for example, a number of **pins.** See e.g., Geyscn ci al., / 
Immun. Meth. 102:239-274 (1987), incorporated herein by 
reference for all purposes. Other solid-phase techniques 
involve, for example, synthesis of various peptide sequences 
on different cellulose disks supported in a column. Sec Frank 
and Doring, Tetrahedron 44:6031-6040 (1988). incorpo- 
rated herein by reference for all pu^wses. Still other solid- 
phase techniques arc described in U-S. Pat. No. 4,728.502 
issued to Hamill and WO 90W626 (Bcattie, inventor). ^j, 

Each of the above techniques produces only a iclaiivcly 
low density array of polymcn. For example, the technique . 
described in Gey sen ci al. is limited to producing 96 
different polymers on pint spaced in the dimensions of a 
standard microliter plate 35 

Improved methods of forming large arrays of oligonucle- 
otides, peptides and other polymer sequences in a short 
period of time have been devised. Of particular note, Pimmg 
et al., U.S. PaL No. 5,143.854 (see also PCT AppUcaiion No. 
WO 90^15070) and Fodor et al.. PCT Publication No. WO 60 
92/10092. all incorporated herein by reference, disclose 
methods of forming vast anays of peptides, oligomicleotidcs 
and other polymer sequences using, for example, light- 
directed synthesis techniques. See also. Fodor etal.. Science. r-- - - „|M ...^ 

251:767-777 (1991). also incorporated herein by reference ss confonnationaUy-re«i^^ 
for all purposes. These procedures are now referred to as port is provided. The mdividual members each have the 
VLSIPS™ procedures. formula: 



of unimolecular. double-stranded oligonucleotides arc pro- 
vided. Each member of the library b comprised of a solid 
support, an optional spacer for attaching the double-stranded 
oligonucleotide to the support and for providing sufficient 
space between the double-stranded oUgonuclcotidc and the 
solid support for subsequent binding studies and assays, an 
oligonucleotide attached to the spacer and further attached to 
a second complementary oligonucleotide by means of a 
flexible linker, such thai the two oligonucleotide portions 
exist in a doublc-sirandcd configuration. More particulariy. 
ihe members of the libraries of the present invention on be 
represented by the formula: 

in which Y is a solid support. L* is a bond or a spacer. L' is 
a fiexiblc linking group, and X* and are a pair of 
complementary oligonucleotides. 

In a specific aspect of the invention, the library of 
dlffacnt unimolecular, double-stranded oligooucleotides 
can be used for serccning a sample for a species which binds 
to one or imrc members of the library. 
In a related aspect of the invention, a library of different 
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-X"-Z-X" 

in which X*' end X" are complcrncmaiy oligonucleotides 
and Z is a probe having sufficient Iccgth lucb thai X*^ and 
X^' fonn a double-stranded oligonucleotide ponioo of the 
member and ther^y restrict the conformations available to 
tiK probe, la a specific aspea of the inveniian, the library of 
different confomtaiionally-resiricted probes can be used for 
screening a sample for a spedes which binds lo one or more 
probes in the library. 

According (o yet another upect of the present invention, 
methods and devices for the bioelearooic detection of 
duplex formaiion arc provided 

According lo still another aspect of the invention, an 
adhesive is provided which comprises two surfaces of ^ 
complementary oligonuclcoiidcs. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIGS. 1 A to IP illusiraie the preparation of a member of 
a libraiy of suiface-bound, unimolccular doublc-strandcd 
DNA as well as binding studies with receptors having 
specificity for citha the double stranded DNA portion, a 
probe which is held in a conformaiionally reitheicd form by - 
DNA scaffolding, or a bulge or loop region of RNA, ^ 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

Abbrcviaiions 

The following abbxviaiiom arc used herein: phi, phenan- 30 
thrcncquinone diiminc; phen', 5-amido-gluiaric acid- 1,10- 
phcnanihrolinc; dppz. dipyridophenazinc. 
Gloss aj>- 

Thc following terms arc intended to have the following 
general meanings as they arc used herein: 

Chemical ictms: As used herein, the term "alkyP refers to 
a saiu rated hydrocarbon radical which may be straighi-chain 
or branched-chain (for example, ethyl, isopropyl t-ainyl. or 
2.5.dimcihylhcxyl). When "alkyl" or "alkylcnc" is used to 
fcfcT to a linking group or a spacer, it is taken to be a group 
having two available valences for eovalem aitochmcnt, for 
cwmpi:, -CHjCH,-, -CHjCHjCH^-, 

— CH^CHjCHiCHjyCHj— and — CHj(CH,CH2)5CHj— . 
Preferred alkyl groups as subsiiiucms arc those containing 1 
to 10 carbon atoms; with those containing 1 to 6 carbon 
atoms being particularly prcfcncd. Preferred alkyl or alky- 
lcnc groups as linking groups arc those containing I lo 20 
carbon atoms, with those containing 3 to 6 carbon atoms 
being particularly preferred. The term "polyethylene glycol" 
is used to refer to those molecules which have repeating 
units of ethylene glycol, for example, hcxacihylcne glycol 
(HO— <CHjCH,0)5— CHjCHjOH). When the icrm "poly, 
ethylene glycol" is used to refer to linking groups and spacer 
groups, it would be understood by one of skill in the art that 
other polycthcn or polyols could be used as well (i. c. 
polypropylene glycol or mixtures of ethylene and propylene 
glycols). 

The term "proicaing group*" as used herein, refers to any 
of the groups which arc designed lo block one reactive site 
in a molecule while a chemical reaction is carried out at 60 
another reactive site More paniculaiiy. the protecting 
groups used herein can be any of those groups described in 
Greene, a al.. Protective Croups In Orgamc Chemistry, 2nd 
Ed.. John Wiley & Sons, New York. N.Y. 199 1, incorporated 
herein by reference. The proper selection of protecting 63 
groups for a particular synthesis will be governed by the 
overall methods employed in the synthesis. For example, in 
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"light-dircacd- synthesis, discussed below, the proiecring 
groups will be photolabilc protecting groups such as NVOC:, 
MeNPOC, and those disclosed in co-pening Applicarion 
PCT/US93/10162 (filed Ocl 22, 1993), incoiporatcd herein 
by reference. In other meibods. protecting groups may be 
removed by chemical methods and indud: groups such as 
FMOC DMT and others known to those of skill in the an. 

Complementary or substantially complementary: Refers 
to ihc hybridization cr base pairing between nucleotides or 
nucleic acids, such as. for instance, between the two strands 
of a double stranded DNA molecule or between tn oligo- 
nucleotide primer and a piimcr binding site on a single 
stranded nucleic add to be sequenced or aropHhod. Comple- 
mentary nucleotides arc, generally. A and T (or A and U). or 
C and G. Two single stranded RNA or DNA molecules arc 
said to be substantially con^lcrocmary when the midcottdcs 
of one strand, optimally alignsd and compared and with 
appropriate nucleotide irucrtions or deletions, pair with at 
)c3si about 80% of the nucleotides of the other strand, 
usually at least aboul 909b to 95%, and more preferably from 
about 98 to lOOft. 

Alternatively, substantial complementary exists when an 
RNA or DNA sirand will hybridiie under selective hybrid- 
ization conditions to its complemenl. Typically, selective 
hybridization will occur when there is at leasi about 65% 
complementary over a stretch of at least 14 to 25 nucle- 
otides, preferably at least about 75%, more preferably at 
least about 90% complementary. S. ce. M. Kanchisa Nucleic 
Acids Res. 1 2:203 (1984), incorporaicd herein by reference. 

Siringcni hybridization conditions will typically include 
salt concentrations of less than about IM, more usually less 
than about 500 mM and preferably less than about 200 mM. 
Hybridization icmpcraiures can be as low as 5' C, but arc 
typically grcaicr than 22* C, more typically greater than 
about 30* C, and preferably in excess of about 37* C. 
Longer fragments may require higher hybridi/^iion icm- 
perauffcs for specific hybridization. As other factors may 
affcc: the stringency, of hybridizaiion. including base com- 
position and length of the complementary strands, presence 
of organic solvents and extcn; of base mismatching, the 
combination of parameters is more important than the abso- 
hnc measure of any one alone. 

Epitope: The portion of an antigen molecule which is 
delineated by the area of inicraaion with the subclass of 
receptors known as antibodies. 

Identifier tag: A means whereby one can identify which 
molecules have experienced a particular reaction in the 
synthesis of an oligomer The identifier tag also records the 
step in the synthesis scries in which the molecules experi- 
enced that particular monomer reaction. The idcmificr tag 
may be any recognizable feature which is. for example: 
microscopically distinguishable in shape, size, color, optical 
density, etc.; differently absorbing or emitting of light; 
chemically rcaaivc; magnetically crclcoronically encoded; 
or in some other way distinctively marked with the required 
infonmation. A preferred example of such an identifier lag is 
an oligonucleotide sequence. 

Ugand/Probc: A ligand is a molecule that is recognized by 
a particular receptor. The agent bound by or reacting with a 
receptor is called a "ligand," a term which is definitionally 
meaningful only in terms of iu counierpxt receptor. The 
term 'ligand"* docs not imply any particular molecular size 
or other structural or compositional feature other than that 
the substance in question is capable of binding or otherwise 
interacting with the receptor. Also, a ligand may serve citha 
as the natu:^ ligand to which the receptor binds, or as a 
functional analogue that may act as an agonist or antagonist 
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Examples of ligands thai can be iovestigated by thu inven- 
tion include, bui arc not restricted to, agonists and antago- 
nists for cell membrane icccpton, toxins and venoms, vim) 
epiiopes, hormones (e.g., opiates, ficroids. etc.). hormone 
receptors, peptides, enzymes, enzyme substmcs, substrate 5 
analogs, transition state analogs, cofaaors. drugs, proteins, 
and antibodies. The term *^robe'* refers to those molccu]es 
which are expected to act like Hgands but for which binding 
iofonnaiion is typically unknown. For example, if a receptor 
is known to bind a ligand which is a pepdde ^tum, a lo 
"probe" or library of probes will be those molecules 
designed to mimic the peptide p-tum. In instances where the 
particular ligand associated with a given receptor is 
unknown, the term probe refers to those molecules designed 
as potential iigands for the receptor. 15 

Monomer Any member of the set of molecules which can 
be joined together to form an oligomer or polymer. The set 
of monomers useful to the presem invention includes, but is 
noi restricted to, for the example of oligonucleotide synthe- 
sis, the set of nucleotides consisdng of adenine, thymine, 20 
cytosine, guanine, and uridine (A. X C G. aod U. respec- 
tively) and synthetic analogs thereof. As used herein, mono- 
mers refers u> any luembcr of a basis se: for synthesis of an 
oligomer. Difcrent basis sets of monomers may be used u 
successive steps in the synthesis of a polymer. 25 

Oligomer or Polymer: The oligomer or polymer 
sequences of the present invention are formed &om the 
chemical or enzymatic addition of monomer subuniis. Such 
oligomcn include, for example, both linear, cyclic, and 
branched polymers of riudeic acids, polysaccharides, phos- 30 
pholipids. and peptides having either o-' p-, or (i>-amino 
adds, hctcropolymcn in which a known drug is covalcmly 
bound to any of the above, polyurethanes. polyesters, poly- 
carbonates, polyureas, polyamides. polyethylenci mines, 
polyarylcne sulfides, polysiloxanes. polyimides. polyac- 33 
ctatcs. or other polymers which will be readily apparent to 
one skilled in the art upon review of this disclosure. As used 
herein, the term oligomer or polymer is meant lo include 
such molecules as p-ium mimetics. prostaglandins and ben- 
zodiazepines which can also be synthesized in a stepwise 40 
fashion on a solid suppon. 

Peptide: A peptide is an oligomer in which the monomers 
arc amino acids and which arc joined together through 
amide bonds and alternatively referred to as a polypeptide. 
In the context of this specification it should be appreciated 43 
that when a-amino acids are used, they may be the L-optical 
isomer or the D-opiical isomer. Other amino adds which arc 
useful in the present invention include uimauual amino acids 
such a p-alaninc. phenyl glycine homoarginine and the Wkz. 
Peptides arc more than two amino add monomers long, ard 5o 
often mom than 20 amino add monomcn long. Standard 
abbreviations for andno adds arc used (e.g.. P for proline). 
These abbreviations are inclixled in Stryer. Biochemistry, 
Third Ed., (1988). which is incorporated herein by reference 
for all purposes. 33 

Oligonucleotides: An oligonucleotide is a single-stranded 
DNA or RNA nx>lcculc. typically prepared by synthetic 
means. Alternatively, naturally occurring oligonucleotides, 
or fragments thereof, may be isolated from thdr oamral 
sources or purchased from commerdal sources. Those oli- 60 
gonucleoddes employed in the piesem invention will be 4 to 
\C0 nucleotides in length, preferably from 6 to 30 nucle- 
otides, although oligonucleotides of diffcrml length may be 
appropriate. Suitable oligoaudeorides may be prepared by 
the phosphorBmldite method described by Bcaucage and 65 
Canuthcn, Tetratudron Lett,, 22:1859-1862 (1981), or by 
(he triester method according to Matteued, et al., / Am, 



Chan. Soc, 103 J 185 (1981), both incorporated berrin by 
reference, or by other chemical methods uring either a 
conuncrdal' automated oligoaudeotide synthesizer or 
VLSIPS"*** technology (discussed in detail below). When 
oligonucleotides are referred to as '^douUe-strandcd." it is 
understood by tbose of skill in the an char a pair of 
oligonucleotides exist in a hydrogen-bonded, helical amy 
typically assodatcd with, for example. DNA. Id addition 10 
the 1(X)% complementaiy form of double*5tiamied oligo- 
nucleotides, the term "double-stranded" as used herein is 
also meant to refer to tbose forms which indude such 
struaural feauires as bulges and loops, described more fully 
in such biochemistry texts as Stryer. Biochemistry* 
Ed.. (1988), previously incorporated herein by reference for 
all purposes. 

Receptor A molecule ihat has an affinity for a given 
ligand or probe. Receptors may be naturally-occurring or 
njanmade molecules. Also, they can be employed in their 
unaltered naniral or isolated state or as aggregates with other 
spedcs. Receptors may be attached, covalently or norKO- 
valently, to a binding member, either directly or via a 
spcdfic binding substance. Examples of receptors which can 
be employed by this invention include, but are not restricted 
to, antibodies, cell membrane receptors, mdnoclonal anti- 
bodies and aotisera reactive with spedfic anugenic deter- 
minants (such as on viruses, cells or other materials), drugs, 
polynucleotides, nucleic acids, peptides, cofaaors. lectiiu, 
sugars, polysaccharides, cells, cellular membranes, and 
organelles. Receptors are sometimes referred to in the an as 
ami-ligands. As the term rccepiors is used herein, no differ- 
ence in meaning is intended. A "ligand-rcceptor pair** is 
formed when two molecules have combined through 
molecular recognition to form a complex. Other examples of 
receptors which can be invesiigatcd by this invention 
include but arc not restricted 10: 

a) Microorganism rcccpion: Determination of Iigands or. 
probes that bind 10 receptors, such as sped fie u^nspon 
pmidns or enzymes essential 10 survival of microor- 
ganisms, is useful in a new class of antibiotics. Of 
panicular value would be antibiotics against opporm- 
nistic fungi, protozoa, and those bacteria resistant to the 
aniibiutics in current use. 

b) Enzymes: For instance, the binding site of enzymes 
such as the enzymes responsible for cleaving ncu- 
roiransmitun^s. Dcurrminaiion of Iigands or probes that . 
bind to ccruin receptors, and thus modulate the action 
of the cn/ymcs that cleave the different neurotransmit- 
ters, is useful in the dcvdopment of drugs that can be 
used in the ucaimcni of disorders of neurotransmission. 

c) Antibodies: For instance, ibc invcnuon may be useful 
in invcsiigaiing the ligand-binding site on the antibody 
molecule which combines with the epitope of an anti- 
gen of interesL Deiermioing a sequenee that mimics an 
antigenic epitope may lead to the development of 
vacdncs of which the immunogen is based on one or 
more of such sequences, or lead to the development of 
related diagnostic agents or compounds useful in thera- 
peutic treatments such as for auioimmuoe diseases 
(e.g.. by blocking the binding of the "scir antibodies). 

d) Nucleic Adds: The Invention may be useful in inves- 
tigating sequences of ouddc adds acting as binding 
sites for cellular protdns Clrans-acting factors'*)- Such 
sequences may include, e.g., transcription factors, sup- 
pressors, enhancers or promoter sequences. 

e) Catalytic Polypeptides: Polymers, preferably polypep- 
tides, which are capable of promodng a chemical 
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reaction involving the conversion of one or more 
rcaaanis lo one or more pnxiucts. Such polypeptides 
generally include a binding site specific for at leut one 
rcacum or reaction intenocdiatc and an active func- 
tionality proxinuie to the binding site, which function- 
ality is capable of chemically mocfifytng the bound 
rcactanL Catalytic polypeptides are described in. 
Lcracr. R.A. ct aJ.. Science 252: 659 (1991). which is 
incorporated herein by reference. 
0 Hormone reoepton: For instance, the receptors for 
insulin and growth hormone. Determination of the 
ligands which bind with high affinity to a receptor is 
Useful in the development of. for example, an oral 
replacement of the daily injections which diabetics 
must take to relieve the synrquoms of diabetes, and in 
the other case, a repUcemcnt for the scarce human 
growth hormone thai can only be obtained from cadav- 
ers or by recombinant DNA technology. Other 
examples arc the vasoconstrictive hormone receptors; 
determination of those ligands that bind to a receptor 
may lead to the development of drugs to control blood 
pressure. 

g) Opiate receptors: Determination of ligands that bind to 
the opiate recepton in the brain is useful is the devel- 
opment of less-addictive replacements for morphine 
and related drugs. 
Subsiraic or Solid Support: A material having a rigid or 
semi-rigid surface. Such materials will preferably take the 
form of plates or slides, small beads, pellets, disks or other 
convenient forms, although other forms may be used. In 
some embodimenis. at least one surface of ihc substntc will 
be substantially flat In other embodiments, a roughly spheri- 
cal shape is prcfencd. 

Synihctie: Produced by in vitro chemical or cruymatic 
synthesis. The synthetic libraries of the present invention 
may be conuastcd with those in viral or plasmid vectors, for 
instance, which rruy be propagated in bacterial, yeast, or 
other living hosu. 

DESCRIPTION OF THE INVENTION 

The broad concept of the pxseni invention is illustrated in 
FIGS. lA to IF FIGS. lA, IB and IC illustrate the prepa- 
ration of surface-bound uni molecular double stranded DNA. 
while HCS. ID. IE. and IF illustrate uses for the libraries 
of the pfcscm invention. 

FIG. 1 A shows a solid support 1 having an attached spacer 
2, which is optional. Attached to the distal end of the spacer 
is a first oligomer 3, which can be attached as a single uni: 50 
or synthesized on the support or spacer in a monomer by 
monomer approach. FIG. IB shows a subsequent stage in 
the preparation of one member of a library accortjing to the 
present invention. In this stage, a flexible linker 4 is attached 
to the distal end of the oligomer 3. In other embodiments, the i% 
flexible linker will be a probe. FIG. IC shows the completed 
surfacc-bound unimolccular double stranded DNA which is 
one member of a library, wberein a second oligomers is now 
attached to the distal end of the flexible linker (or probe). As 
shown in FIG. IC, the length of ihe flexible linker (or probe) 60 
4 is suSicicnt such that the first and second oligomers (which 
arc con^lementary) exist in a double-stranded conforma- 
tion. It will be appreciated by one of skill in the art, that the 
libraries of the present invention will contain multiple, 
individually synthesized members which can be screened for 65 
various types of activity. Three such binding events are 
illustrated in FIGS, i D, IE and IF. 



In FIG. ID. a receptor 6. which can be a protrin. RNA 
molecule or other molecule which is known to bind to D.N A. 
- is introduced to the library. Determining which member of 
a library binds to the receptor provides information which is 
3 useful for diagnosing diseases, sequencing DNA or RNA. 
identifying genetic charan eristics, or in drug discovery. 

In FIG. IE, the linker 4 u a probe for which binding 
information is sought The probe is held in a confonnaiion- 
ally restricted manner by the flanking oligomers 3 and 5. 
ID which are present in a double- stranded conformation. As a 
result, a library of conformational ly restricted probes can be 
screened for binding activity with a receptor 7 which has 
specificity for the probe. 

The present invention also coniemplaics the preparation 
of libraries of unimolccular. doublc-strandcd oligoroidc- 
otidcs having bulges or loops in one of the strands as 
depicted in FIG. IF. In FKi. IF, one oligonucleotide 5 is 
shown as having a bulge 8. Specific RNA bulges arc often 
recognized by proteins (c.g., TAR RNA is recognized by the 
^ TAT protein of HIV). Accordingly. Ubraries of RNA bulges 
or loops arc useful in a number of diagnostic applications. 
One of skill in the an will appicciatc that the bulge or loop 
can be present in cither oligonucleotide portion 3 or 5. 
Libraries of Unimolccular. Doublc-Siranded Oligonuclc- 
^ otidcs 

In one aspect, the present invention provides libraries of 
unimolccular doublc-strandcd oligonucleotides, each mem- 
ber of the library having the formula: 

30 y-L'-X'-t-'-X' 

in which Y reprcscnu a solid support. X* and represent 
a pair of complementary oligonucleotides, L* represents a 
bond or a spacer, and represents a linking group having 
35 sufficient length such thai and form a double-stranded 
oligonucleotide. 

The solid support may be biological, nonbiological. 
organic, inorganic, or a combination of any of these, existing 
as panicles, strands, precipiiaics. gels, sheets, tubing, 
40 spheres, containers, capillaries, pads, slices, films, plates, 
slides, etc. The solid support is preferably flat but may take 
on alicmaiivc surface configurations. For example, the solid 
support nuy contain raised or depressed regions on which 
synthesis takes place. In some embodiments, the solid 
45 support will be chosen to provide appropriate light-absorb- 
ing charaaeristics. For example, the support may be a 
polymerized Langmuir Blodgcii film, funciionalizcd glass. 
Si. Gc. GaAs, GaP, SiOj. SiN*, modified silicon, or any one 
of a variety of gels or polymers such as (po!y)ictranuoro- 
cthylenc. {poIy)vinylidcndinuoridc, polystyrene, polycar- 
bonate, or combinations thereof. Other suitable solid support 
materials will be readily apparent to those of skill in the an. 
Preferably, the surface, of the solid support will contain 
reactive groups, which could be carbbxyl, amino, hydroxyl, 
thiol, or the like. More preferably, the surface will be 
optically transparent and will have surface Si — OH func- 
tionalities, such as arc found on silica surfaces. 

Attached to the solid support is an optional spacer. LV The 
spacer molecules are preferably of sufficient length to permit 
the doublc-strandcd oligomiclcoiidcs in the completed mem- 
ber of the library to interaa freely with moJcculei exposed 
to the library. The spacer molecules, when present, are 
iypically6-50 atoms long to provide sufBcicnl exposure for 
the attached double-stranded DNA molecule. The spacer. L\ 
is comprised of a surface attaching portion and a longer 
chain portion. The surface aiuching portion is that pan of L 
which is directly attached to the solid support. This portion 
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can be auzched to the solid support via carfaon-carfacn bondi 
using, for example, supports having (poly)trifluorochloro- 
ethylene surfaces, or prtferably, by tUoxsne bonds (usiog. 
for example, glass or silicon oxide as tbc loUd support). 
Siloxanc bonds with the surface of the suppon are formed in 
one embo±ment via reactions of surface aoachiog portions 
bearing irichlorosilyl or trial koxysilyl groups. The surface 
attaching groups will also have a site for anachment of the 
longer chain portion. For example, groups which are suitable 
for attachment to a longer chain portion would include 
amines, hydroxy), thiol, and carboxyl Preferred surface 
attaching portions include amiooalkylsilancs and hydroxy- 
alkylsilanes. In particularly preferred embodiments, the sur- 
face attaching portion of L' is cither bis(2-bydroxyclhyl)- 
aminopiopyltriethoxysilaxx, 

2-hydroxycthylaminopropyltricthoxysilanc, aminopropyliri- 
cihoxysilane or bydroxypropyltrietboxysilane. 

The longer chain ponioa can be any of a variety of 
molecules which arc inert lo the subsequent conditions for 
polymer synthesis. These longer chain portions will typi- 
cally be aryl acetylene, ethylene glycol oligomcn containing 
2-14 monomer units, diaraircs. diadds, amino acids, pep- 
tides, or combinations thereof. In some embodiments, the 
longer chain portion is a polynucleotide. The longer chain 
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of the compounds of the invention, the linking group will be 
provided with fimctiooal groups which can be suitably 
prDtccted or tctivaied. The linking group will be covalcotl^ 
ittached to each of the complancmary oligonucleotides, X 
and X'. by means of an ether, ester, carbamate, phosphate 
ester or amine linkage. The flexible linking group will be 
attadwJ to the y-hydrosyl of the terminal monoma of X 
and to die 3'-bydroxyl of the initial monomer , of X^. Pre- 
fetied linkages arc pbos[rtiatc ester linkages which can be 
formed in the same manner as the oligonucleotide linkages 
which are presem in X' and Xl For example, hexacthyl- 
eneglycol can be protected on one terminus with a photo- 
labile protecting group (Lc. hfVOC or McHPOQ and 
activated on the other lerminus with 2-cyanocihyI-N/4- 
diisopropylamiDO-ehlorophosphiie to form a phosphoramid- 
iie. This linkhig group can then be used for cormiuciion of 
the Ubrarics in the same manner as the photolabile-proicded, 
phospboramidite-aciivaicd nucleotides- Altcmaiively. csier 
linkages to X* and X' can be formed when the has 
terminal caxboxylic acid moieties (using the 5*-bydroxyl of 
X' and the T-hydroxyl of X^. Other methods of forming 
ether, carbamate or amine linkages are known to those of 
skin in the an and particular reagents and references can be 
found in such texts as March, Advanced Organic Chemistry^ 
4ih Ed.. Waey-lntersdcrtce. New Yoric. N.Y. 1992, inoor- 



bascd upon its hydrophilic/bydrophobic properties to 
improve presentation of the double- suanded oligonucle- 
Glides to certain receptors, proteins or dnigs. The longer 
chain portion of L' can be constrtxcied of polyethylenegly- 
cols, polynuclcoiidcs. alkylenc. polyalcoho!. polyester, 
poly amine, polyphosphodiesier and combinations thereof. 
Additionally, for use in synthesis of the libraries of the 
invention. L' will typically have a protecting group, attached 
to a functional group (i.e., hydroxy 1, amino or carboxylic 



The oligonucleotide, X^. which is covalcnily attached to 
the distal end of the linking group is, like X\ a single- 
stranded DNA or RNA molecule. The oligonucleotides 
which are part of the present invention are typically of from 
about 4 to about 1 00 nucleotides in length. Preferably. X is 
an oligonucleotide which is about 6 to about 30 nucleoUdes 
in length and cshibits complementary to X* of from 90 to 
100%. More preferably. X' and X^ arc 100* complemen- 
tary. In one group of cmbodimems, either X* or X will 



to a lunCUOnai group ti.C., nyaroxyi, uaunu w i^uwajh*- one group Ol craooaimcm*. auici wi « w... 

acid) on the distal or terminal end of the chain portion 33 further comprise a bulge or loop portion and exhibit comple- 
(oppositc the solid support). After depn)iecuon and cou- mcniary of from 90 to 100% over the remainder of the 
pling. the distal end is covalently bound to an oligomer. 

Auachcd to the distal end of L* is an oligonucleotide. XV 
which is a singlc-sirandcd DNA or RNA molecule. The 



oligonucleoudc. 

In a particulaHy preferred cmbodimcnu the solid support 
is a silica support, the spacer is a polyethyleneglycol con- 



oligonucleotides which arc part of the present invention are « juggigj u, ^ aminoalkylsilanc the linking group is a 
typically of from about 4 to about 100 nucleotides in length. polyclhylenegl 



Preferably, X* is an oligonucleotide which is about 6 to 
about 30 nucleotides in length. The oligonucleotide is typi- 
cally linked to L' via the 3'-hydroKyl group of the oligo- 
nucleotide and a functional group on L' which results in the 43 
formation of an ether, ester, carbamate or phosphate ester 
lir\kagc. 

Attached to the distal end of X* is a linking group. L . 
which is flexible and of sufficient length that X* can cffec 



polyethyleneglycol group, and X' and X are complemen- 
tary oligonudcoiides each comprising of from 6 to 30 
nucleic acid monomers. 

The library can have virtually any number of difTcreni 
members, and will be limited only by the number or variety 
of compounds desired to be screened in a given application 
and by the synthetic capabilities of the preaiiioncr. In one 
grt)up of embodiments, the library will have from 2 up to 
100 members. In other groups of embodiments, the library 
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typically be a length which b at least the length spanned by 
two nucleotide monomers, and preferably; at least four 
nucleotide monomas. while not be so long as U) interfere 
with either the pairing of X* and X' or any subsequent 
assays. Tne linking grtjup itself will typically be an alkylenc 53 
group (of from about 6 to about 24 carbons in length), a 
polyethyleneglycol group (of frt>m about 2 to about 24 
cihylcneglycol mononm in a linear configuration), a poly- 
alcohol group, a polyamine group (eg., spermine, spermi- 
dine and polymeric derivauves ihereoO. a polyester group 60 
(e.g., poly(ethyl aaylate) having of from 3 to 15 ethyl 
acrylau mooomcrs in a Uncar conhguraiion). a polyphos- 
phodiesier group, or a polynucleotide (having £rt>m about 2 
to about 1 2 nucleic acids). Pn:fcrably, the linking group will v" ^ y 

be a polycihvlcncglycol group which is at least a tettaeth- 65 m which X and X 
yleneglycol,'and more preferably, from about 1 to 4 hexa- 7 u » nmhc. Th 

ethyleneglycols linked in a linear array. For use in syruhesis 



1 0000 and 1 000000 numbers, preferably on a solid support. 
In preferred embodiments, the library will have a density of ' 
more than 100 members at known locations per cm , pref- 
erably more than 1000 per cm\ more preferably more than 
10.000 per cm'. 

Libraries of Coftformationally Restricted Probes 

In siili another aspect, the present invention pn>vides 
libraries of conformationally-rcsuiacd probes. Eadi of the 
members of the library comprises a solid support having an 
opiioaal spacer which Is attached to an oligomer of the 
formula: 

— X"-2-X" 

arc complementary oligomiclcotidcs 

and Z is a probe. The probe will have sufficient length such 
that X*' and X" form a douWc-stranded DNA portion of 
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each member. X'^andX'^are&sdcscribod sbovcforX* and 
X' respeaivcly. excq>t ihai for the ptescm aspect of the 
invention, each member of the probe library can have the 
same X" and the same X". and differ only in the probe 
portion. In one group of embodiments. X" and X" art 3 
cither a poly-A oligonucleoUde or a poly-T ofigomzdcoude. 

As noted above, cadi member of the library will typically 
have a dilTercnt probe portion. The probes, Z, can be any of 
a variety of structures for which receptor-probe binding 
informatioD is sought for confomationally-restricted fomis. 10 
For example, the probe can be ah agonist or antagonisi for 
a cell membrane receptor, a toxin, venom, vital epitope, 
hormone, peptide, enzyme, collector, drug, protein or anti- 
body. In one group or embodiments, the probes are different 
peptides, each having of from about 4 to aibooi 12 amino is 
adds. PrefeT3b)y the probes will be linlced via polyphos- 
phate dicsters. although other linkages are also suitable. For 
example, the last monomer employed on the X" chain can 
be a 5*-aminopTOpyl-functiona}ized phosphoramidiie nucle- 
otide (available from Glen Research, Sterling, Va.. USA or 20 
Gcnosys Biotechnologies, The Woodlands, Tex.. USA) 
which wilt provide a synthesis initiaUon site for the carboxy 
to amino synthesis of the peptide probe. Once the peptide 
probe is formed, a 3'-sucetnylated nucleoside (£rom Cru- 
achem. Su^ritng, Va^ USA) will be added under peptide 25 
coupling conditions. In yet another group of embodiments, 
the probes will be oligonucleotides of from 4 to about 30 
nucleic acid monomers which will form a DNA ot RNA 
hairpin structure. For use in synthesis, the probes can also 
have associated functional groups (i.e., hydroxy 1, amino. 30 
carboxylic acid, anhydride and derivatives ihereoO for 
attaching two positions on the probe to each of the comple- 
mentary otigonudeotidcs. 

The surface of the solid support is preferably provided 
with a spacer molecule, although it will be understood that 35 
the spacCT molecules arc not elements of this aspect of the 
invention. Where present, the spacer molecules will be as 
described above for LV 

The libraries of conformationally rcstriaed probes can 
also have virtually any number of membcn. As above, the 40 
number of members wilt be limited only by design of the 
panicular screening assay for which the library will be used, 
and by the synthetic capabilities of the practitioner. In one 
group of cmbodimcms. :he library will have from 2 to 100 
members. In other groups of embodiments, the library wilt 45 
have bciwccn 1 00 and 10000 members, and between 10000 
and 1000000 DKmbcrs. Also as above, in preferred cmbodi- 
menu, the library will have a density of more than 100 
members at known locations per cm', preferably more than 
1000 per cm', more preferably more than 10,000 per cm'. 50 
Preparation of the Libraries 

The present invcmion furtha provides methods for the 
preparaiion of diverse unimolccular, double-stranded oligo- 
nucleotides on a solid support. In one group of cmbodi- 
mcnu. the surface of a solid support has a plurality of 35 
preselected rcgions. An oligonucleotide of from 6 to 30 
monomers is formed on each of the presel ecte d regions. A 
linUng group is then attached to the distal end of each of the 
oligonucleotides. Fir.ally. a second oligonucleotide is 
formed on the distal end of each linking group such that the 60 
second oligonucleotide is coroplemerttary lo the oligonucle- 
otide already present in the same preselected legioa The 
linkiag group used will have suQicieni length such thai the 
complementary oligonucleotides form a uni molecular, 
double-stranded oligonucleotide. In another group of 65 
embodiments, each chemically distirKi member of the 
Hb:ar>- will be symhcsizcd on a separate solid support 



Libraries on a' Single Subsuatc 
Ughi-Directed Methods 

For those erobodimcnu using a single solid support, the 
oligonucleotides of the present ioveiuion can be formed 
using a variety of techniques known to those skilled in the 
art of polymer synthesis on solid supports. For example, 
*'Kghi dircaed** methods (which are one technique in a 
family of methods known as VLStPS*** methods) arc 
described in U.S. PaL No. 5,143^54. previously incorpo- 
rated by reference. The light directed methods discuss ed in 
the *B54 patent involve activating predefined regions of a 
substrate or solid support and then cooucting the substrate 
with a prcsclcaed monomer solution. The predefined 
regions can be activated with a light souiee, typically shown 
through a mask (much in the manner of photolithogra;^ 
techniques used in integrated circuit fabrication). Other 
rcgioiu of the substrate remain inactive because they arc 
blocked by the mask from illumination and remain chemi- 
cally proteaed Thus, a light patlem defines which regions 
of the substrate react with a given monomer. By repeatedly 
activating dilfereni sets of predefined regions and contacting 
different monomer solutions with the substrate, a diverse 
array of polymers is produced on the substrate. Of course, 
other steps such as washing onreactcd monomer solution 
from the substrate can be ttscd as necessary. Other tech- 
niques include mechanical techniques such as those 
described in PCT No. 92/10183. U.S. Pat, No. 5384,261 
also incorporated herein by reference for all purposes. Still 
further lechniqi^cs include bead based techniques such as 
those described in PCT US/93/04145, also incorporated 
herein by rcfercnee. and pin based methods such as those 
described in U.S. PaL No. 5^8.514, also incorporated 
herein by reference. 

The VLSIPS™ methods are preferred for making the 
compounds and librvies of the prcscnl inveniioa The 
surface of a solid support, optionally modified with spacen 
having phoiolabile protecting groups such as NVOC and 
McNPOC. is illuminated through a photolithographic mask, 
yielding reactive groups (typically hydronyl groups) in the 
illuminaicd rcgions. A 3'-0-pho$phoramidiic activated 
dcoJiynucleoside (protected at the 5'-hydroxyl with a pho- 
iolabile protecting group) is then presented to the surface 
and chemical coupling occun at sites that were exposed to 
light. Following capping, and oxidation, the substrate is 
rinsed and the surface illuminated through a second mask, to 
expose additional hydroxy! groups for coupling. A second 
5'-pn>iccicd, 3'-0-phosphoTamidjic aaivaicd deoxynudeo- 
side is presented to the surface. The selective phoiodepro- 
tcciion and coupling cycles arc repeated until the desired set 
of oligonucleotides is produced. Alternatively, an oligomer 
of from, for example. 4 to 30 nucleotides can be added to 
each of the preselected rcgions rather than synthesize each 
member in a monomer by monomer approach. At this point 
in the synthesis, cither a flexible linking group or a probe can 
be aitachcd in a similar manna. For example, a flexible 
linking group such as polyethylene glycol wilt typically 
having an activating group (i.e.. a phosphoramiditc) on one 
end and a phoiolabile protecting group attached to the otha 
end. Suitably derivatiicd polyethylene glycol linking groups 
can be prepared by the methods described in Durand. ci al. 
Nucleic Acids Res. 18:6355-6359 (1990). Briefly, a pdy- 
ethylene glycol (Lc, hexacthylene glycol) can be mono- 
protected using MeNPOC-chloride. Following purificaijon 
of the mono-protected glycol, ibc remaining hydroxy moiety 
can be activated with 2-cyanoethyl-N^-diisopropylami- 
fK)chlorophosphiic. Once the flexible linking grtntp hea been 
aitachcd to the fint oligonucleotide (X*X dcproiectiOD and 
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coupling cycles will proceed usin£ 5'-pro(ected, 3'-0-phos- 
phorazrJdiie activated deoxynudeosides or intact otigomers. 
Probes can be attached in a manner similar to that used for 
the flexible linliing group. Wbcn the desired probe is itself 
an oligomer, it can be formed either in stepwise fashion oo s 
the immobilized oligonucleotide or it can be sepaistely 
synthesized and coupled to the immobilized oligomer in a 
single step. For example, preparation of cooformationally 
restricted ^tum tnimedcs wiU typically involve synthesis of 
an oligonucleotide as described above, in which the last lo 
nucleoside monomer will be deri vatized with an aminoalkyl- 
funciionalized phosphoramidite. See, U.S. Pat No. 5,288. 
514, previously incorporated by reference. The desired 
peptide probe is typically fonned in the direction from 
carboxyl to amine cenninus. Subsequent coupling of a 15 
3'-succinylated nucleoside, for example, provides the firsi 
monomer in the construction of the complementary oligo- 
nucleotide strand (which is carried out by the above meth- 
ods). Alternatively, a library of probes can be prepared by 
first derivaiizing a solid suppon with multiple poly(A) or 20 
polyCD oligonucleotides which are suitably protected with 
photolabiie protcaing groups, deprotecting at known sites 
and constnicting the probe at those sites, then coupling the 
complemcniary polyCH or poly(A) oligonucleotide. 

Flow Channel or Spotting Methods 25 

Additional methods applicable lo library synthesis on a 
single substrate are described in co-pending applications 
Sen No. 07^980,523, filed Nov. 20, 1992, and U.S. Pat. No. 
5.384^61 . incorporated herein by reference for all fxirposes. 
In the mcOiods disclosed in these applications, reagents are 30 
delivered to the substrate by cither (I) flowing within a 
channel defined on predefined regions or (2) "spotting" on 
predefined regions. Howcvct, other approaches, as well as 
combinations of spotting and flowing, may be employed. In 
each instance, certain activated regions of the substrate arc 33 
mechanically separated from other regions when the mono- 
mer solutions arc delivered to the various reaction siu:s. 

A typical **flow channcr* method applied to the com- 
pounds and libraries of the present invention can generally 
be described as follows. Diverse polymer sequences arc 40 
sy nihesized at selected regions of a substrate or solid suppon 
by forming flow charmels on a surface of the substrate 
through which appropriate reagents flow or in which appro- 
priate reagents arc placed. For example, assume a monomer 
"A" is to be bound to the substrate in a first group of sclcocd 4S 
regions. If necessary, all or part of the surface of the 
substrate in all or a pan of the selected regions is activated 
for binding by, for example, flowing appropriate reagents 
through alt or some of the charuicls, or by washing the entire 
substrate with appropriate reagents. After placement of a 50 
channel block on the surface of the substrate, a reagent 
having the monomicr A flows through or is placed in all or 
some of the charmel(s). The channels provide fluid coniaa 
to the first selcacd regions, thereby binding the numomer A 
on ihc subsuaie directly or indirectly (via a spacer) in the 55 
first selected regions. 

Thereafter, a monomer B is coupled to second selected 
regions, some of which may be included among the first 
selected regions. The second selected regions will be in fluid 
coruaci with a second flow channel(s} through translation. 60 
rotation, or replacement of the charujcl block on the surface 
of the substrate; through opening or closing a selected valve; 
or through deposition of a layer of chemical or photorcsisu 
If necessary, a step is perfarmed for activating at least the 
secund regions. Thereafter, the monomer B is Bowed 65 
through or placed in the second flow charmel(s), binding 
monomer B at the second scleaed locations. In this particu- 
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lar example, the resulting sequences bound to the substrate 
at this stage of processing will be, for example. A, B, and 
AB. The process is repeated to form a vast array of 
sequences of desired length at known locations on the 
substrate. 

After the substrate is activated, monomer A can be Bowed 
through some of the channels, monomer B can be flowed 
through other charmels, a monomer C can be flowed through 
still other channels, etc. In this manner, many or all of the 
reaction regioiu are r e a cted with a monomer before the 
channel block must be moved or the substrate must be 
washed and/or reactivated. By maldng use of many or all of 
the available reaction regions simultaneously, the number of 
washing and activation steps can be minimi zed 

One of skill in the an will recognize chat there are 
alternative methods of forming channels or otherwise pro- 
tecting a portion of the surface of the substrate. For example, 
according to some embodiments, a proteaive coating such 
as a bydrophQic or hydrophobic coating (depending tipon 
the toture of the solvent) is utilized over portions of the 
substrate to be protected, sometimes in combixmion with 
materials that facilitate wetting by the reactant solution in 
other regions. In this manner, the flowing sohitions are 
further prevented from passing outside of their designated 
flow paths. 

The "sponing" methods of preparing cornpounds and 
libraries of the present invention can be implemented in 
much the same manner as the flow channel methods. For 
example, a monomer A can be delivered to and coupled with 
a fiir. group of reaction regions which have been appropri- 
ately activated. Thereafter, a monomer B can be delivered to 
and reacted with a second group of activated reaction 
regions. Unlike the flow charmcl embodiments described 
above, reaciants are delivered by directly depositing (rather 
than flowing) relatively small quantities of them in selected 
regions. In some steps, of course, the entire substrate surface 
can be sprayed or otherwise coated with a solution. In 
preferred embodiments, a dispenser moves from region to 
region, depositing only as much monomer as necessary at 
each stop. Typical dispensers include a tnicropipeuc to 
deliver the monomer solution to the substrate and a robotic 
system to control the position of the micropipettc with 
respect to the substrate, or an ink -jet printer. In other 
embodiments, the dispenser includes a scries of tubes, a 
manifold, an array of pipcucs. or the like so that various 
reagents can be delivered to the reaction regions simulta- 
neously. 

Pin-Based Methods 

Another method which is useful for the preparation of 
compounds and libraries of the present invcmion involves 
"pin based synthesis." This method is described in detail in 
U.S. PaL No. 5.288.514. previously incorporated herein by 
refereiKS. The method utilizes a substrate having a plurality 
of pins o." other extensions. The pins are each inserted 
simultaneously into irnlividual reagent containers in a tray. 
In a common embodiment, an array of 96 pins/containcn is 
utilized. 

Each tray is filled with a particular reagent for coupling in 
a particular chemical reactios on an individual pin. Accord- 
ingly, the trays will often contain different reagents. Since 
the chemistry disclosed herein has been established such that 
amladvdy sinular set of reaction conditions may be utilized 
to perform each of the reactions, it bcconw possible to 
conduct multiple chemical coupling steps simultaneously. In 
the first sup of the process the invention provides for the use 
of sabstrate(s) on which the chemical coupling steps are 
corvhicied. The substrate is optionally provided with a 
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spaixr having active sites. In the panicular case of oligo- 
nudeoUdu. for example, the spacer may be selected from a 
wide variety of roolecules which can be used m oisanic 
environmcnu asKiciated with syntfceds as well as aqueous 
cnviroDment5 associated with bindins studies. Examples of s 
suitable spacers are polyeihylcncglyools, dicarboxylic acids, 
polyamines and alkylcnes, substituted with, for example, 
meiboxy and cthoxy groups. Additionally, the spaoen will 
have an active site on the distal end The active sites are 
optionally protected initially by protecting poups. Among a lo 
wide variety of protecting groups whidi arc useful are 
FMOC BOC, t-butyl esters^ t-butyl ethers, and the like. 
Various exemplary protecting groups are described in, for 
example. Athcnon et al, SoUd Phase Peptide Syruhesis, IRL 
Press (1989), incorporated herein by reference. In some 15 
cmbodiracms, the spacer may provide for a deavable func- 
tion by way of, for example, exposure to acid or base. 
Libraries on Multiple Substrates 
Bead Based Methods 

Yet another neihod which is useful for synthesis of 20 
compounds and libraries of the present invention involves 
"bead based synthesis.'* A general approach for bead based 
synthesis is described copending appHcauon Set. Nos. 
07/762J22 (filed Sep. 18, 1991 now abandoned); 077946, 
239 (fUed Sep. 16. 1992); 08/146,886 (fUed Nov. 2, 1993); 25 
07/876.792 (filed Apr, 29, 1992) and PCT/US93rt>4145 
(6led Apr. 28. 1993). the disclosures of which arc incorpo- 
rated herein by reference. 

For the synthesb of molecules such as oligonucleotides 
on beads, a large plurality of beads arc suspended in a 30 
suitable carrier (such as water) in a container. The beads arc 
provided with optional spaca molecules having an active 
site. The active site is protected by an optional protecting 
group. 

In a first step of the synthesis, the beads arc divided for 35 
coupling into a plurality of containers. For the purposes of 
this brief description. Ih2 number of containers will be 
limited to three, and the monomers dcrvned as A. B, C. D. 
E. and F. The protecting groups arc then tcmoved and a first 
portion of the molecule to be synthesized is added to each of 40 
the three oonmincrs (i. c., A is added to container 1, B is 
added to container 2 and C is added to container 3). 

Thereafter, ihc various beads arc appropriately washed or 
excess reagents, and remixed in one container. Again, it will 
be rceogni-itcd that by virtue of the large number of beads 45 
utilized at the outset, there will similariy be a large number 
or beads randomly dispersed tn the container, each having a 
particular first portion of the monomer to be synthesized on 
a surface thereof. 

Thercartcr. the various beads arc again divided for cou- 50 
pling in another group of three containers. The beads in the 
first container are deprotectcd and exposed to a second 
monomer (D), while ihc beads in the second and third 
cofuaincrs arc coupled to molecule portions E and F respec- 
tively. Accordingly, molecules AD, BD. and CD will be S5 
present in the first container, while AE BE, and C£ will be 
present in the sccor.d container, and molecules AF, BF, and 
CP will be present in the third container. Each bead. how. 
ever, will have only a single type of molecule on its surface. 
Thus, all of the possible molecules fonnod from the first 60 
pontons A. B, C &nd the second portions D, E, and F have 
been formed. 

The beads arc then recombincd into one container and 
adcfitional steps such as arc conducted (0 complete Ihc 
synthesis of the polymer molecules. In a preferred cmbodi- ts 
menu the beads arc tagged with an identifying tag which is 
unique to the particular double-stranded oligonudcotide or 



probe which is presem on each bead A complete description 
of identifier tags for use in synthetic libraries is provided in 
CO- pending application Ser. No. 08/146.886 (filed Nov. 2. 
1 993) previously incorporated by tefereace for all purposes. 
Methods of Library Screening 

A library prepared according to any of the methods 
described above can be used to screen for receptors having 
high affinity for either unimolecular. double-stranded oligo- 
nucleotides or conformatiooally restricted probes. In one 
group of embodiments, a solution containing a marked 
O&belled) receptor is introduced to the librery aixl incubated 
for a suitable period of time. The library is then washed free 
of unbound receptor end the probes or dooble-stranded 
oligonucleotides having high affinity for the receptor arc 
identified by ideruifyiog those regions on the surface of the 
library where markers are located. Suitable markers include, 
but are not limited to, radiolabds. chromophores. Ruoro- 
phores, chemiluminescent moieties, and transition metals. 
Alternatively, the presence of receptors may be detected 
using a variety of other techniques, such as an assay with o 
labdled enzyme, antibody, end the like. Other techniques 
using various marker systems for deteaing bound receptor 
wilt be readily apparent to those skDled in the art 

In a preferred embodiment, a library prepared on a single 
solid support (using, for example, the VLSIPS™ technique) 
can be exposed to a solution containing marked receptor 
such as a marked antibody. The receptor can be marked in 
any of a variety of ways, but in one embodiment marking is 
eflfected with a radioactive label. The marked amibody birds 
with high affinity to an immobilized antigen previously, 
localized on the surface. After washing the surface free of 
unbound receptor, the surface is placed proximate to x^ray 
fihn or phosphorimagcrs to identify the antigens that arc 
recognized by ihc antibody. Alternatively, a fluorescent 
marker may be provided and detection may be by way of a 
charge-coupled device (CCD), fluorescence microscopy or 
laser scanning. 

When autoradiography is the detection method used, the 
marker is a radioactive label, such as "P. The marker on the 
surface is exposed to X-ray film or a phosphorimagcr, which 
is developed and read out on a.scanner. An exposure time of 
about I hour is typical in otk embodiment. Fluorescence 
detection using a fluorophorc label, such as fluorcscdn. 
attached to the rcccpior will usually require shoncr exposure 
limes. 

Quantitative assays for receptor concentrations can also 
be performed according to the present invention. In a direct 
assay method, the surface containing localized probes pre- 
pared as described above, is incubated with a solution 
containing a marked receptor for a suitable period of lime. 
The surface is then washed free of unbound receptor. The 
amount of mariccr present at predefined regions of the 
surface is then measured and can be related to ihe amount of 
receptor in solution. Methods and conditions for performing 
such assays arc well-known and arc presented in, for 
example. L. Hood et al.. Immunology^ Bcnjamin/Cummings 
(1978). and E. Harlow ci al.. Antibodies. A Laboratory 
Manual, Cold Spring Harbor Laboratory, (1988). Sec, also 
U.S. Pat No. 4^76,1 10 for methods of performing sandwich 
assays. The predsc conditions for performing these steps 
will be apparent to one skilled in the an. 

A competitive assay method for two receptors can also be 
employed using the presem invention. Methods of conduct- 
ing competitive assays are known to thise of skili in the an. 
One such method involves immobihziag conformationally 
resiriaed probes on predefined regions of a surface as 
described above. An unmarked first receptor is then bound 
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10 the probes on ibc surface having a known specific binding ing incubation with the unlabeled protein, ihc library will be 

afiniiy for the rtceptDis. A wlution caniaining a maiied treated with DNase I and examined for areas which arc 

second receptor is theo introduced to the surface and incu* protected from cleavage. 

bated for a suitable rimp. The surface is then washed free of The assay methods described above for the libraries of the 

unbound reagents itt) the amount of tp**^*^ remaining on s present invention can also be used in reverse drug discovery, 

the surface is measured. In another form of corapeiiiicn In such an appHcaUon. a compound having known phanna- 

assay, marked and unmarked receptors can be exposed to the cological safety or other desired properties (e.g.. aspirin) 

surface simultaneously. The amount of marker rcmaijiing on could be screened against a variety of douWe-siranded 

predefined regions of the surface can be related to the oligonucleotides for potential binding. If the compound is 

amount of unknown receptor in sohition. Yet another form of lo shown to bind to a sequence associated with, for example, 

competition assay will utilize two receptors having different tumor suppression, the compound can be further examined 

labels, for example, two cfiffcreni chiomophorts. for cfiBcacy in the related diseases. 

In other embodimems, in order to detect receptor binding. In other cmbodimenls, probe arrays comprising ^-tum 

the double-stranded oligonucleotides which are formed with mirrciics can be prepared and assayed for activity against a 

attached probes or with a flcxibie linking group will be I5 particular receptor, ^tura mtm rj i n?! are compounds having 

ucaied with an imcicalating dye, preferably a fluorescent molecular struaures simflar to ^tuxns which are one of the 

dye. The library can be scannttl to establish a background three major components in protein molecular architecture. 

nuoTcsccncc. After exposure of the library to a receptor ^wms are similar in concept to hairpin turns of oligonucle- 

solution, the exposed U*brary will be scatmed or illuminated oiidc strands, end are often critical recognition features for 

and examined for those areas in which fluorescence has 20 various proicin-ligand and protein-protein interactions. As a 

changed. Aliemativdy, the receptor of interest can be result, a library of ^tum mimetic probes can provide or 

labeled with a fiuorescent dye by methods known to those of suggest new therapeutic agents having a particular affinity 

skill in the an and incubated with the library of probes. The for a receptor which will correspond to the affinity exhibited 

library can then be scanned or illuminated, as above, and by the ^tum and its receptor, 

examined for areas of fluorcsoerKC. 25 BioclccLnonic Devices and Methods 

In instances where the libraries are synthesized on beads In a.iother aspea, the present invenuon provides a method 

in a number of containers, the beads are exposed to a for the bioclectronic detection of scqucnce-specific oligo- 

rcccpior of inicresL In a preferred embodiment the receptor nucleotide hybridization. A general method and device 

is fluorescently or radioactively labelled. Thereafter, one or which is useful in diagnostics in which a biochemical 

more beads are identified that exhibit significant levels of. M species is attached to the surface of a sensor is described in 

for example, fluorescence using one of a' variety of lech'- U.S. Pal. No. 4462.157 (the Lowe patent), incorporated 

niqucs. For example, in one embodiment, mechanical scpa- herein by reference. The present method utilizes arrays of 

ration under a microscope is utilized The idcmity of the immobilized oligonucleotides (prepared, for example, usmg 

molecule on the surface of such separated beads is then VLSiPS^M technology) and the known photo-induced elcc- 

idcmified using, for example. NMR, mass spectrometry. 33 tron transfer which is mediated by a DNA double heUx 

PCR amplification and sequencing of the assodaicd DNA. structure. See, Murphy el al.. Science 262:1025-1029 

or the like. In anoiha embodiment, automated sorting (i.e.. (1993). This method is useful in hybridizaUonbascd diag- 

nuorcsccncc acUvalcd cell sorting) can be used to separate nosiics, as a replacement for fluorescence-based deicoion 

beads (bearing probes) which bind to receptors from those sysuims. The method of bioclearonic deiecuon also ofTcrs 

which do not bind. lypicaUy the beads vnll be labeled and 40 higher resolution and potentially higher sensitivity than 

idcnufied by methods disclosed in Nccdcls. et al., Proc. . earlier diagnostic methods involving scqucndng/dcteaing 

Natl Acad 5cL USA 90:10700-10704 (1993), incorporated by hybridization. As a result, this method finds applications 

herein by reference. in genetic mutation screening and primary sequencing of 

The assay methods described above for the libraries of the oligonucleotides. The method can also be used for Scqucnc- 

prcscni invention will have tremendous application in such 45 ing By HybridizaUon (SBH). which is described in co- 

cndeavors as DNA tootprinting" of proteins which bind pending application Scr Nos. 08/082,937 (filed Jun. 25. 

DNA. CuTTCnUy. DNA footprinling is conducted using 1 99 3 now abandoned) and 08/ 168, 904 (filed Dec. 15. 1993). 

DNasc I digestion of doublc-suandcd DNA in the presence each of which are incorporated herein by reference for all 

of a putative DNA binding protein. Gel analysis of cut and purposes. This method uses a set of short oligonucleotide 

protected DNA fra«mcnu then provides a "footprint" of 50 probes of defined sequence to search for complementary 

where the protein contacts the DNA. TTiis method is both sequences on a longer targe: strand of DNA. The hybrid- 

labor and lime intensive. Sec Galas el aI..W«c/«cAci(//I«. izaiion paucm is used to reconstruct the target DNA 

33137 (1978). Using the above methods, a "footprint" could sequence. Thus, the hybridiiaiion analysis of large numbers 

be produced using a single array of unimolocula.', double- of probes can be used to sequence long stretches of DNA. In 

stranded oligonucleotides in a fraction of the lime of con- 55 immediate applications of this hybridization methodology, a 

veniional methods. Typically, the protein will be labeled small number of probes can be used to interrogate local 

with a radioactive or fluorescent species and incubated with DNA sequence. 

a library of uniroolecular, douhlc-strandcd DNA. Pbospho- In the present inventive method, hybridization is moni- 

rimaging or fluorescence dcieciion will provide a footprint lored using bioelecironic detection. In this method, the target 

of those regions on the library where the protein has bound. « DNA. or first oligonucleoude. is provided with an clcciron- 

Altcrmiively, unlabeled proton can be used. When unla- donor tag and then incubated with an array of oligonucle- 

bclcd protein is used, the double- stranded oligonucleotides otidc probes, each of which bears an electron-acceptor tag 

in the Ubrary will all be labeled with a marker, typically a and. occupies a known position on Ihe surface of the amy. 

fluorescent markw. Incorporation of a marker into each After hybridization of the first oligonucleotide to the array 

member of the hTyrary can be earned out by icrminaung the 65 has occurred, the hybridized array is flluminatcd to induce 

oligonucleotide synthesis with a comraerctally available anclcorontrinsfer reaction in the direcUon of the surface of 

fluorescing phosphoramidite nucleotide derivative. Follow- the array. The cleoron transfer reaction is then detected aJ 
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the locaiion on the surface where hybridiiiuoo has uken 
place. Typically* each of the oligooudcotidc pmbcs in an 
amy will have an auachod electm-accepior tag tocaied 
near the surface of the solid support used in preparaxSon of 
the amy. In cmbodimenu in which the amyi axe prepared s 
by lighi-directed recihodi (ic, typically 3' to S* direction), 
the elcctronacccptor laf will be located near the 3* postiion. 
TtK dectron-accepior U£ can be auached either to the 3* 
monomer by methods known to those of skill in the an. or 
it can be attached to a spacing group between the 3' lo 
monomer and the solid support. Such a spacing group will 
have, in addition to functional groups for auachment to the 
solid support and the oligonucleotide, a third functional 
group for aiiadimcm of the clcaronaoxptor tag. The target 
oligonuclcoiidc will typically have the eleoron-dotwr tag 15 
attached at the 3* position. Alternatively, the tar;get oligo- 
nucleotide can be incubated with the array in the absence of 
an electron-donor tag. Following incubation, the electron- 
donor tag can be added in solution. The electron-donor tag 
will then intercalate into those regions where hybridization 20 
has occurred. An electron transfer reaction can then be 
detected in those regions having a condnuous DNA'double 
helix. 

The electron-donor tag can be any of a variety of com- 
plexes which panicipaie in electron transfer reactions and 2S 
which can be attached to an oligonucleotide by a means 
which docs not inicrfcrc with the electron transfer reaction. 
In preferred embodiments, the electron-donor tag is a ruihc- 
Dium (II) complex, more preferably a ruthenium (U) 
(phcn')2(dppz) complex. 30 

The clecuon-acccptor tag can be any species which, with 
the clcciron-donor tag, will participate in an electron transfer 
reaction. An exainplc of an electron-acceptor tag is a 
rhodium (111) complex. A preferred elcction-acccplor tag is 
a rhodium (111) (phi)2(phen') complex. 35 

In a particularly prcrcrrcd embodiment, the electron- 
donor ug is a ruthenium (II) (phcn'),(dppie) complex and the 
dccuon-acccptor tag is a rhodium (III) (phDjCphen') com- 
plex. 

In still another aspect, the present invention provides a 40 
device for the bioclccironic detection of scqueoce>spccifTc 
oligonucleotide hybridization. The device will typically con- 
sist of a sensor having a surface to which an array of 
oligonucleotides arc attached. The oligonudcoiidcs will be 
attached in prc-dclincd areas on the surface of the sensor and 45 
have an electron-acceptor tag attached to each oligonucle- 
otide. The electron-acceptor tag will be a tag which is 
capable of produdng an electron transfer signal upon lUu- 
ntina'.ion of a hybridized species, when the complcmcmary 
oligonucleotide bears an elcctrondonating lag. The signal 50 
will be in the direction of the sensor surface and be detected 
by the sensor. 

In a preferred embodiment, the sensor surface will be a 
siiicon-basod surface which can sense the dcctrontc. signal 
induced and. if nccessa.7, amplify the signal. The metal 55 
comaas on which the probes will be synthesized can be 
treated with an oxygen plasma prior to synthesis of the 
pmbcs to enhance the silane adhesion and concentraiion on 
the surface The surface will further comprise a multi-gated 
field eJcct transistor, with each gale serving as a sensor and 60 
diffcTcm oligonucleotides attached to each gale. The oHgo- 
nucleotides will typically be attached to the meia) coniacu 
on the sensor surface by means of a spacer group. 

The spacer group should not be too long, la order to 
ensure thai the sensing function of the device is easily 65 
Qctivatcd by the binding interaction and subsequent iltumi. 
nation of the "tagged" hybridized oligonucleotides. Prefer- 
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ably, the spacer group is from 3 to 12 atoms in length and 
will be as described above for the stirface modifying portion 
of the spacer group. LV 

The oligonucleotides which are attached to the spacer 
group can be formed by any of the solid phase techniques 
which are known to those of skill in the arL Preferably, the 
oligonucleotides are formed one base at a time in the 
direction of the 3' terminus to the 5* terminus by the 
"light-directed" methods described above. The oligonucle- 
otide can then be modified at the 3* end to attach the 
electron-acceptor lag. A niunber of suiuble methods of 
atiachmcm are krwwa For example, modification with the 
reagent AminoHnk2 (from Applied Biosystcrru, Inc.) pro- 
vides a icnnina] phosphate moiety which is derivetizcd with 
an aminohcxyl phosphate ester. Coupling of a carboxylic 
add, which is present on the ekctron-acceptor tag. to the 
srmnc can then be carried out using KOBT and DCC. 
Alternatively, synihc^s of the oligonucleotide can begin 
with a suitably derivatizcd and protected monomer which 
can then be dcprotccted and coupled to the electron-acceptor 
ug once the complete oligonucleotide has been synthesized. 

The silica surface can also be replaced by silicon nhridc 
or oxyniiridc, or by an oxide of another meifll, especially 
ahiminum. titanium (IV) or iron (III). The surface can also 
be any other film, membrane, insulator or semiconductor 
overlying the sensor which will not interfere with the 
detection of electron transfer detection and to which an 
oligonudcoiidc can be coupled. . 

Additionally, detection devices other than an FET can be 
used. For example, sensors such as bipolar transistors, MOS 
transistors and the like arc also useful, for the detection of 
decuon transfer signals. 
Adhcsives 

In siill another aspect, the present invention provides an 
adhesive comprising a pair of surfaces, each having a 
plurality of attached oligonucleotides, wherein the single- 
stranded oligonucleotides on one surface are complementary 
to the single-stranded oligooudeotides on the other surface. 
The sucngth and position/orientation spccificiiy can be 
conuollod using a number of factors including the number 
and length of oligonucleotides on each surface, the degree of 
complementary, and the spatial arrangement of complemen- 
tary oligonucleotides on the surface. For example, increas- 
ing the number and length of the oligonucleotides on each 
surface will provide a stronger adhesive. Suitable lengths of 
oligonucleotides arc typically from about 10 to about 70 
nucleotides. Additionally, the surfaces of oligonucleotides 
can be prepared such that edhesion occurs in on extremely 
position-sped fic manner by a suitable arrangement of 
complementary oligonucleotides in a specific pattern. Small, 
deviations from the optimum spatial anangement arc ener- 
getically unfavorable as many hybridization bonds must be 
broken and arc not reformed in any other relative orienta- 
tion. 

The adhcsives of the present invention will find use in 
numerous applications. Generally, the adhcsives arc useful 
for adhering two surfaces to one another. More spedfically. 
the adhcsives will find application where biological com- 
patibility of the adhesive is desired. An example of a 
biological application involves use in surgical procedures 
where tissues must be hdd in fixed positions during or 
following the procedure. In this application, the surfaces of 
the adhesive will typically be mernbrarKS which arc com* 
patible with the tissues to which they arc attached. 

A particular advantage of the adhcsives of the present 
invention is that when they arc formed in an orientation 
specific manner, the adhesive portions will be "self-finding," 
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that is the syticm will go to the thennodynamic equiUbnum 
in which the two sides sre matched in the predetermined, 
onematioo spedfic mamier. 

EXAMPLES s 

Example 1 

This example iBiutrates the general lyniheiis or an amy 
of unimolecular, double-stranded oligooucleoiides on a solid ,q 
support 

Unimolecular double strssded DNA molecules were syn- 
thesized on a solid support using staodazd Ught-dtreoed 
methods (VLSIPS^ protocols). Two hexacthylenc glycol 
(PEG) linkers were used to covalently aiiaeh the synthesized 15 
oligonudeoddcs to the derivatizcd glass surface. Synthesis 
of the first (inner) strand proceeded one nucleotide at a lime 
using repeated cycles of photo-deprotection and chemical 
coupling of protected oudeoddes. The nucleotides each had 
a protecting group on the base portion of the monomer as 30 
well as a phoiolabtlc MeNPoc protecting group on the 3' 
hydroxyl. Upon completion of the inner strand, another 
MeNPoc-proiected PEG linker was covalently attached to 
the 5' end of the surface-bound digonudeotide. After addi- 
tion of the iniemal PEG linker, the PEG is photodeproiected, 25 
and the synthesis of the second strand proceeded in the 
normal fashion. Following the synthesis cydes, the DNA 
bases were deprotected using standard protocols. Tlie 
sequcocc of the second (outer) suand. being complementary 
to thai of the inner strand, provided molecules with short, 30 
hydrogen bonded, unimolecular double-stranded suuaure 
as a result of the presence of the iniemal flexible PEG linker. 

An array of 16 different molecules were synthesized 00 a 
derivaiized glass slide in order to determine whether short, 
unimolecular DNA struaures could be formed on a surface 35 
and whether they could adopt sbiictures that arc recognized 
by proicins. Each of the 16 different molecular species 
occupies a different physical region on the glass surface so 
that there is a onc-lo-onc correspondence between molecular 
idcniity and physical location. The molecules arc of the form 40 

S-P-P-C-C-A/T-A/T-A/T-A/T-G-C-P-G-C-Arr-A/r-A/r- 
A>T-G-G-F 

where S is the solid surface having silyl groups, P is a PEG 
linker. A. C, G. and T arc the DNA nucleotides, and F is a 
fluorescent Ug. Ths DNA sequence is listed from the 3* to 
the 5' end (the 3* end of the DNA molecule is aiuchod 10 the 
solid surface via a silyl group and 2 PEG linkers). The 
sixteen molecules synibcsizcd on the solid support differed 
in the various permutations of A and T in the above formula. 

30 

Examples 

This example illustrates the ability of a library of surface- 
bound, unimolecular. double-stranded oligonucleotides to 
exist in duplex form and to bc recognized and bound by a 33 
protein. 

A librao' different members was prepared as 

described in Example 1. The 16 molecules all have the same 
composition (same number of As, Cs, Gs and Ts). but the 
orda is different. Four of the molecules have an outer strand 60 
that is ICX)% complementary to the inner strand (these 
molecules will be referred to as DS, doublestranded, below). 
One of the four DS oligonucleotides bu a seqticoce that is 
recognized by the resirictioo enzyme EcoRl. If the molecule 
can loop back and form a DNA duplex, it should be 63 
recognized end cut by the restriction emyme, thereby releas- 
ing the fluoTTsccm tag. Thus, the action of the enzyme 



provided a fimcuonal test for DNA structure, end also served 
lodcmoostraie thai these sirucmres can be recognized at the . 
surface by preteins. The ranaining ] 2 molecules bad outer 
strands that were sot complementary to their inxer strands 
(referred to as SS, siogle-stnnded, bdow). Of these, three 
had an outer strand and three had an inner strand whose 
sequffnof was an EcoRl half-site (the sfqnrorr on one 
strand was correct for the enzyme, but the other half was 
not). Xtt solid support with ao may of molecules on the 
surface is referred to as a **cbip** for the purposes of the 
following discussion. The presence of fluorescently labelled 
molecules on the chip was detected using oonfocal fluores- 
cence microscopy. The action of various enzymes was 
determined by monitoring the change in the amount of 
fluorescence from the molecules on the chip surfece (e.g. 
"reading" the chip) upon treatmeru with enzymes thai cao 
cut the DNA and release the fiuoresceot tag at the S* end. 

The three different enzymes used to characterize the 
siiucture of the molecules on the chip wore: 

1) Mung Bean Nudease — sequence indepcDdeni, single- 
strand spedfic DNA cndonudeasc: 

2) DNasc I — sequence independent, double-strand spe- 
cific endonuclease; 

3) EcoRl — restriction endonuclease that recognizes the 
sequence (S'-S*) 

GAATTC in double stranded DNA. and cuts between the 
G and the first A. Mung Bean Nuclease and EcoRl were 
obtained from New England Btolabs, and DNase I was 
obtained from Boehringcr Mannheim. Ail enzymes were 
used ai a conceniration of 200 units per mL in the buffer 
recommended by the manufacturer. The.enzymatic reactions 
were performed in a 1 mL flow cell at 22* C. and were 
typically allowed to proceed for 90 minutes. 

Upon treatment of the chip with the enzyme EcoRl, the 
fluorescence signal in the DS EcoRl region and the 3 SS 
regions with the EcoRl half-site on the outer strand was 
reduced by about 10% of iu initial value. This reduction was 
ai least 5 times greater than for the other regions of the chip, 
indicating that the action of the enzyme is sequence spedfic 
on the chip. Ii was not possible to determine if the factor is 
greater than 5 in these preliminary experiments because of 
unccnainiy in the constancy of the fluorescence background. 
However, because the purpose of these early experiments 
was 10 determine whether unimolecular double- stranded 
structures could be formed and whether they could be 
spcdfically recognized by proteins (and not to provide a 
quaniiiauve measure of enzyme spcdfidty). qualitative dif- 
ferences between the differcni synthesis regions were suf- 
fident. 

The reduction in signal in the 3 SS regions with the EcoRl 
halt- site on the outer strand indicated dthcr that the enzyme 
cull single-strandcd DNA with a particular sequence, or thai 
these molecules formed a double-stranded structure that was 
recognized by the enzyme. The molecules on the chip 
surface were ai a rdaiively high density, with an average 
spadng of approximately 100 angstroms. Thus, it was 
possible for the outer strand of one molecule to form a 
double-stranded structure with the outer strand of a neigh- 
boring molecule. In the case of the 3 SS regions with the 
EcoR] half-site on the outer strand, such a bimolecular 
double-suvoded region would have the correct sequence arxl 
structure to be recognized by EcoRl. However, it would 
differ from the unimolecular doublc-sirandcd molecules in 
that the inner strand remains siogle-straixled and thus ame- 
nable to cleavage by a tingle-strand spedfic endonuclease 
such as Mung Bean Nudeasc. Tbercfore. it was possible to 
distinguish unimolecular from bimolecular double-sbanded 
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DNA molecules on ihc sutface by chetr abQicy lo be cut by 
stztglc and double-strand specific cndoaudeues. 

- In order (o icniove all molecules that have tingle-stranded 
tmiccurcs and to identify umnuilecular double-stnndcd 
molecules, the chip was first exhaustivelj treated with Muog 
Bean Nuclease. Tne reduction in the fluoresceoce signal was 
greater by about a factor of 2 for the SS regions of the chip, 
including those with the EcoRl half-site on the outer strand 
chat were cleaved by EcoRl, than for the 4 DS regions. 
Following Muog Bean Nuclease treatment, the chip was 
treated with' either DNasc I (which cuti all remaining 
double-stranded molecules) or EooRl (which should cut 
only the remainisg double-stranded molecules with the 
correct sequence). Upon ireatmem with DNate I, the fluo- 
rescence signal in the 4 DS regions was reduced by at least 
5-fold more than the signal in the SS regions. Upon EcoRl 
ireatmem. the signal in the single DS region with the cornea 
EcoRl sequence was reduced by at least a factor of 3 more 
than the signal in any otha region on the chip. Taken 
together, these results indicated thai the surface-bound mol- 
ecules synthesized with two complememary strands sepa- 
rated by a Qcxiblc PEG linker form intramolecular doublc- 
strandcd stniaures thai were resistant to a single-strand 
specific endonudeasc and were recognized by both a 
double-strand specific endonudease, and a sequence-spe- 
cific restriaion enzyme. 

What is claimed is: 

1. A syntheUc unimoiecular, double-stranded oligonucle- 
otide library comprising a plurality of different members, 
each member having the formula: 
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Y— L*— X»— L»— X' 

wherein. 
Y is a solid support; 
^ X' and are a pair of complementary oligonucleotides; 
L' is a spacer 

is a linking group having tuffideru length such that X * 
and X^ form a double-stranded oligonucleotide. 
10 Z A library in accordance with claim 1. wherein is a 
polyethylene glycol froup. 

3. A library in accordance with claim 1, wherein X* and 
X' are complcmectary oligonucleotides each comprising of 
from 6 to 30 nuddc add monomers. 

4. Alibrary in accordaiKc with claiffll.wherno said solid 
support is a silica support and comprises an aminoalkyl* 
silane and from 1 to 4 hexftethyleneglycols. 

5. A library in accordance with claim 1, wherdn said solid 
2p support is a silica support, \} comprises an amlnoalkylsilane 

and from 1 to 4 hexaethykneglycols. is a polycthylcncg- 
lycol group and X' and X' are complementary oligonucle- 
otides each comp r i sing of from 6 to 30 nucldc add mono- 
mers. 

23 6. A synthetic unimolecular, double-stiaiKled ohgonucle- 
otide library of daim 1, wherein a portion of said double - 
stranded oUgonudeocides formed by X* and X^ further 
comprise a loop. . 
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APBAV^i nv MATFRIALS ATTACHED TO A fl^<^ from the rcmctor system, selectively amvatmg 

^^J^ATT ttc tnm^ladon stage, and selectively OlumiMtiiig the sub^ 

stratc so as to fonn a plurality of diverse polymer sequences 

CROSS REFERENCT TO RELATED . on the substrate at prcdctcmiiiicd locatioas. 

APPLICATIONS 3 The invcntioa also provides a technique for selection of 

. linker molecules in a very large scale immobilized polymff 

This appUcatioQ is a division of U^. patent ^jplicatioo synthesis (VLSIPS^ method. According to this aspect of 

Sa. No. Oa/390^72, filed Feb. 16. 1995, now VS. Pat No. invcatioii. the invention provides a method of $<Tccning 

5,489.678,whicfa is a continuation of U.S. patent appUcatioa ^ plurality of HnV^r pcdymers for use in hinding affinity 

Scr. No. 07/624,120, filed Dec 6. 1990. now abandoned, invcntioo inchides the steps of fonning a 

which is a continuation-in-part of U.S. patent application pinrajj^y of linV^^ poiymcrs on a substrate in selected 

Scr. No. 07/492.462, filed Mar. 7, 1990. now U.S. PaL No. ^^^^^ ^ linW polymers formed by the steps of recur- 

5.143.854, which is a continuation-in-part of U.S. patent j^dy-oo a surface of a substrate, irradiating a portion of the 

^Ucation Ser. No. 07/362.901, filed Jun. 7, 1989, now ^i^^txd regions to remove a protective group, and contact- 
abandoned, and hereby incmpCTatcd herein by reference for^^ ing the surface with a monomer, contacting the pluraUty of 

all purposes. This plication is also a continuation-in-pait polymas with a hgand; and contacting the ligand with 

of U^. patent application Set No. 08/456,887, filed Imtl, ^ ubeW receptor. 

1995, which is a division of U.S. patent ^Eation So. No. ^cccrdinfl to another aspect of the invcntioo, iiiproved 

07/954,646. filed Sep. 30 19^ dow I^^^^J^!; photortmovable protedive group* are provided- According 

934, which is a division of VS. patent appUcatioo Sff . No. ^ f invention a compound having the 

07/850356. filed Mar. IZ 1992, ik)w U.S. PaL No 5,405, ^ 
783. which b a division of U^. patent application Ser. No, 

07/492,462, filed Mar. 7, 1990, now VS. Pat. Sen Na ^ 

5.143,854, whidi is a continuation-in-part of VS. patent n i X m 

plication Scr. No. 07/362.901 filed Jon. 7, 1989. now ^ J^rA^^^t^r^^ 

abandoned. / . I Cj | 

This appHcation is also related to VS. patent ^Ucation rj^'^^^x^ oUe 

Set No. 08/670,118 filed Jun. 25, 1996, which is a division J 

cfU.S.patcnt^licationSci:No.08/168,104,filcdDec 15, ^ 

1993, which is a continuation of U.S. patent application Set x . . r 

Na 07/624 114 filed Dec 6, 1990, now abandoned, and wherein n=C or 1; Yis selected from the group consisting of 

U^. patent'apjdication Scl No. 07/626,730, filed Dec 6. ,^ oxygen of the carboxyl group of a naimal or unnatural 

1990 now U S- PaL No. 5,547,839, and also incorporated . tmino acid, an amino group of a natural or unnattiral amino 

herrili by reference fa aU purposes. add, or the C-5' oxygen group of a n^ ot unnatural 

33 deaxyribonuddc or ribonndcic add; R' and R indcpco- 

COPYRIGKT NOnCE dcntly arc a hydrogen atom, a lower allcyL aryL bcnryL 

disclosure « it .ppc« ii. the P«cnt ^ VI^IPS^ScKwSgy. Accorfing to ooe 

patent fik or records. b« othawue reserves idl copynghi niqu« ttdadque. the tevention jrovides u 

tights wh«JMevB. ordaed method for farming a plunUty of polymer 

BACKGROUND OF THE INVEOTION 4J sequences by sequentiil '^^^J^ !!2Sfrf ^ 

^^^^ step of saially protecting Md dq»X)«eenag portioos of the 

Hie present inventioo relates to the field of polymer pi^alicyof potynKr sequences for addition of other pottioas 

synthesis. More spedfically, die ioventioa provide aieacxor ^ ^ polymer sequences using • binary synthesis strategy, 

system, a masking stistegy, pbotsreinovable proteoive lo^jroved dau collection equ^ment and tecfanlqaes are 

groups, data collection and processiag techniques, and appli- jg ^ provided. According to one embodimenu the insttu- 

catioQS for light directed ^nlhesis of diverse polymer mcntatioo provides a system for determining affinity of a 

sequences on substrates. receptor to a ligand comprising: uteans for applying light to 

a surface of a substrate, the substrate coiqniiaga plurality 

SUMMAIOrOFTHEINVENnON of ligands at predrterminedlDcatioos. the means for provide 

Methods mpmins, and cotryosilions for synthesis and jj ing simultaneous llhtmiaarion at a plurality of the ptwJrta- 

use of divcwTpolymer sequences on a substrate are Biinedlocaiio«:andanarray of detectors f«jtowtagl«^ 

Si^osed. as wdUs>pK«^ons thereof. fl««se«d at the ph«Uty of predetammed locaaoca. The 

^^Bg ,0 one S« of the tavemion. an improved invendon tofcer provides for ^-P^^f Jj-^gTSS 

J^Wrtanfcr syEs of diverse polyma- sequences niques including the steps of closing 

^^t^S/nrmrt^i^MrdinatottoMdwS^ «o reo^ors to a substrate, the substrate coi^nsmg a pteah^ 

. tar ddivoina selected reaetioo fluids data collection pouts withu» e»di of the regioos, detemto- 

Mb^a^ ^ least . first reStive location relative to a points; removing the data cdle«w« pomts de««^ngfrom. 

^oMdK^locatioKalishtforiUutinnatingthesubstia^ u predetermined statistical distributioo: and dettamning a 

pngrammed digital conqjuter for selectively directing a coUection points. 
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Protcctrd amiDO tdd N-caxboxy anhydrides for use in B. Binny SyDtbcas Strategy 
polymer synthesis are also disdosed According to this 1. Example 

asDccL the invention provides a oon^xxind having the for- 2. Example 

muui 3. Example 

3 4. Example 

5. Example 

6. Example 
C Linker Selection 
D. Itotecting Groups 

vy ^° 1. Use of Fhotoremovable Groups During Solid-Phase 

^ ^ Synthesis of Peptides 

° 2, Use of Fhotoremovable Groups During Solid-Phase 

Synthesis of OUgonudeotides 

where R is a side chain of a natural or unnatural amino acid ^ Amino Add N-Carboxy Anhydrides Protected with a 
and X is a pbotorcmovable protecting groop. u 'photorcmovablc Group 

A further understanding of toe nature and advantages of ^ ^ Collection 
the invenlions herein may be realized by rcfcreacc to tne • r-^i^^ Cv^^-m 

«^g portions of the spedflcation and the attached A. Data CoUechon System 
*^in« B.DaU Analysis 

"^^^* 2Q V. Other Reprcsemativc i^)plications 

BRIEF DESCRIPTION OF THE DRAWINGS a. Oligomidcodde Synthesis 

HG. 1 sdwnatically illastrates light^dirccted spatially- yl'^^^on 
addmsable pcralld chemical synthesis; 

FIG. 2 sdicm^ically flhistratcs one cxan5)le of light- ^ I DEFINinONS 

directed peptide synthesis; Catain terms used herein arc intended to have the fol- 

FIG. 3 is a three-dimcasional representation ctf a portion lowing general definitions: 

rf the checkerboard array of YGOTL and(PGGFL; ^ Conqslemcntary: 

FIG 4 schaxttticaUy illustrates an auKanaled system for Refers to the topological compatibility or matching 

synthesizing diverse polymer sequences; 30 together of interacting surfaces of a ligand mol^e and its 

FIGS Sfl and 5^ fllnstrate opeiarion of a program for reccptoc Tlius. the receptor and its ligand can be desmb^^ 

poK^^; .3 complementaiy. and furthermore/the contact surf^ 

poiyiiicr syuicao. . -.^ * characteristics arc complementary to each other. 

FIGS, ifl and to arc a schematic illDstranon of a >irc jEdUm^^^^^ 

binary masking strategy; "-Tbeportionof an antigen molecalc which is delineated by 

HGS. 7fl and 76 are a schematic illustration of a gray code .^^^^ interaction with the subclass of rcccpcors known 

binary masking strategy; ^ antibodies, 

FIGS. 8fl and 86 are a schematic illustration of a modified 3. Ligand: 

gray code binary masking strategy; a ligand is a molecule that is recognized by a particular 

FHj 9a scbematicaUy illustrates a masking scheme for a 40 receptor. Examples of ligands that can be investigated by 

four step synthesis; this invention indndc, but are not restricted to, agonists and 

FIG 96 sdicmaticaUy iUustiales synthesis of aU 400 antagonists foe cdl membrane receptors, toxms and v^o^^ 

J^ZJ^^ iu«w«« 7 ^ epitopes, hoononcs. honnone receptors. p<g)adcs, 

pcpode tfimers. enzymes, enzyme substrates, cofactors, drugs (e.g. opiates. 

FIG. 1# is a coordinate map for the ten-step bmary ^^ST eSTkctins, sugars, oligonucleotides, nucleic 

«yn**^c»"; luids, oligosacdiarides, proteins, and monoclonal antibod- 

FKj. 11 sdicmaticaUy illustrates a data ooUe<lion system; 

FIG. 12 is a Mock diagram illnstrating the architecture erf 4. Monomer 

the data coUectioo systemi; a member of the set of small molecules which can be 

FKj. 13 is a flow chart fllustrating operation of software 50 joined together to form a polymer. The set of mooomcrs 

for the daU collection/analysis system; and includes but is not restricted to, for rxamp lc, the set oi 

FK} 14 iUustr^es a thrt<Mlimensional plot of intensity common l^amino adds, the set of I>^j> ^ « 

vtmJ'po^^^fe^tdir^ »y«hctic amino adds, toe «^<^,'^»^^'«^^^ ^ 

/ pentoses and hexoses. As used hercm, monomers refers to 

DESOUFnON OTTHE PREFERRED 53 any nMriber of a basis set for synthesis of a polymo: For 

EMBODIMENTS example, dimexs of the 20 natoraUy occurring L-amino adds 

focm a basis set of 400 monomers for synthesis of polypqn 

CONTENTS Different basis sets <rf naooomcrs nuy be used at 

L Definitions successive steps in the synthesis of a polymct Furtbezmorc, 

n. General 60 ^*rh of the sets may indude protected members whidi are 

Deprotection and Addition modified after synthesis. 

1. yi-g^mpiff 5. Peptide: 

2. P-'""!*!^ Apolymer in which the mooomcrs are alpha amino adds 
B Antibody recognitioo and which are jdncd together through amide bonds and 

'l P^.tT^i'^ " alternatively referred to as a polypeptide. In the contcxt-of 

m. Syndieis this specification it should be ^jpreciated that the amx»o 

A. Rcactcr System ^ L-optical isOTicr or the D-optical isomer. 
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Peraddcs arc oflca two or more amino add moDom^ c) Catalytic Polypeptides: 

]TXr^^^^ 7^^o add manors long, Stai^ Polymm. prcfcrabty polypeptide, whxd, are capable 

S^^^OQS for amino adds arc used (e.g.. P foe of promoting a chemical rcamon mvolvmg the coo- 

SnTrt^rabtec^ are included in Stiyer. ver^n of one or more rcaoanis to one or more 

SSSJ^'^KTSS. whidib . Fodaos. Such p^ypeptides ^^-^^y ^ 

Jy^SSeno^ for all purposes. ^ding site speafic for at least one rcaoant or 

6 R^Soo: reaction intermediate . and an active functionahty 

Energy which may be selectively applied induding proximaxc to the binding site, whidi fonctiooality is 

energy having a wavdength of between ICT^* and ICT c^«blc of chemically modifying the bound rcaaant 

mctcs induding, for example, dectron beam radiation. c^ytic polypeptides arc described in, for example, 

gamma radiation, x-ray radiation, ultraviolet radiation, vis- U^. Pu. No. 3Jil5.899» which is incorporated 

iblc li^t infrared radiation, mioowavc radiadoo. and raxlio herdn by reference for aB purposes, 

waves. "Irradiation** refers to the application of radiation to HoimoQe receptors: 

a surface. EjLMznpics of bonxMCS receptors indudc, eg., the 

7. Receptor. 13 receptOTs for iosulia and growth hormone, Ddermi- 

A n»lecule that has an affinity for a given ligand. Rccep- nation of the ligands which bind with high afSniiy to 

ton may-be natcrally-occuning or manmadc molecules. a rccqrtor is useful in the devdopmcnt ot for 

Also, they can be anploycd in their unaltered state or as exanqjlc, an oral replacement of the daily injections 

aggregates with other spcdcs, Rcccpton may be attached. whidi diabetics most take to rcHcve the synqtoms of 

covalcntly or noncovalently, to a binding member, either ^ diabetes, and in the other case, a replacement for the 

directly or via a specific binding substance. Ex a ny les of scarce human growth hamone which can only be 

receptors whidi can be employed by this invention indndc, obtained from cadavers or by recom b inan t DNA 

bat are not restricted to, antibodies, ccU membrane techndogy. Other examples are the vasoconstriclivc 

receptors, monodonal antibodies and antisera reactive with bormooe rcc^>tof3; drfrrrmn i Ti oo of those ligands 

specific antigenic determinants (such as on viruses, cells or ^ whidi bind to a receptor may lead to the dcvdop- 

otha materials), drugs, polynucleotides, nuddc adds, mrnr of drugs to control blood pressure, 

peptides, oofactors, lectins, sugars, polysaccharides, cells, ^ Opiate recepCocs: 

cdlular mcmbnnes, and organelles. Receptors arc some- Detctminjttion of ligands which bind to the opiate 

times refffied to in the an as anti-ligands. As the term rccq)tar5 in the brain is useful in the development o# 

receptors is used herein, no difference in meaning is icss-addictivc replacements for morphine and related 

intended. A ligand Rector Plir^ is formed when two dni^, 

maoomolecules have combined through molecsilar rccog- g Substrate:. 

nitiootbfccmaconq)lex. Other examples of reccpton which A mataial having a rigid or semi-rigid surface. In many 

. can be investigated by this invention indudc but are not cmbodiiiMts. at least one surface of the substrate will be 

restricted to: . . 33 substantially flat, although in some embodiments it may be 

a) Microorqanism receptors: derirable to physically separate synthesis regions for diffcr- 
DctcrminatioD of ligands which bind to recqirors, such pdymcn with, for example, wcUs, raised regions, etched 

as ^)cdfic tran^x>rt proteins or enzymes essential to trenches, or the like. According to other embodiments, small 

survival of microorganisms, is useful in devdoping beads may be pro vided on the surface which nuy be released 

a new rU« of antibiotics. Of particular value would ^ upon oonaplction of the synthesis, 

be antibiotics against opportiuastic ftingi, protozoa, 9 protective Group: 

and those bacteria resistant to the antibiotics in Amaierial which is chemically bound to a monomer unit 

ament use. and which may be removed upon sdective exposure to an 

b) Enzymes; activator such as dectromagnctic rwhation. Exa nyles o f 
For instance, one type of receptor is the binding site of 43 protective grwqw with utility herein indude those compris- 

cnzymes such as the enzymes responsible for deav- jng nitropiperonyL pyrcnylmelhoxy-carboQyU nitrovcratryL 

ing neurotransmittcxs; determination of Ugands nitrobenzyl, dimethyl dimethoxybcnzyl, S-bromo-T- 

whidi bind to certain reccgms to modulate the nitroindoUnyl, o-hydroxy-a-mcthyl cinnamoyl. and 

action of the enzymes which deave the different 2-oxymctfaylene anthraquinoBC. 
neurotransmittcxs is useful in the devdopnwit of 50 lO. ftedefincd Region: 

drugs which can be used in the treatment of disorders A predefined region is a localized area on a surface which 

of neurotransmission. U. was. or is imcaded to be activated for formation of a 

c) Antibodies- polymer The predefined region may have any convenient 
For instance, the invention may be useful in invcsti- shape, eg., drcuUr. rectangular, d^tical, wedge-shaped, 

gating the ligand-binding site 00 the antibody mol- 33 etc For the sake of brevity herein, "predefined regions are 

ccule which combines with the epitope of an antigen sometimes refcned to sinq>ly as "regions.- 

of interest; drtrrmining a sequence that mimics an 11. SubstantiaUy Pure: ^ ^ ^ ^ ^ „ _ . 

antigenic epUope may lead to the-devdopment of a polymer is considered to be -substantially p^^ 

vaodncsofwhichtheimmunogenisbasedonoDcor a predefined region <rf a substrate when it exhibits diarac^ 
more of such sequences or lead to the devdopment 60 teristics that distinguish it from other predefined regions, 

of related diagnostic agents <r compounds useful in TVpically, purity win be measured in tcnns of taologicd 

therapeutic treatments such as for auto-immune dis- activity or function as a result of uniform sequence. Such 

eas«(e.g. by blocking the binding of the "sdT charadcristicswiUtypically be measured by way irfbmdmg 

antibodies). with a sdected ligand or recqitor. 
d^ Nudeic Adds- " 12, Activator refers to an energy source ad^rted to render a 

Sequences of nuddc adds may be synthesized to group active and whidi is directed from * * 

DNA <r RNA binding sequences. predefined location on a substrate. A primary iUustration of 
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inictiv«tori3lighL<>berocAmplcsof*cdvat^ After dcproteciioQ. a fint of a set of building blocks 

beams, electric fidds. magnetic fields. clcctroQ beams, x-ray, (indicated by "A" in HG. 1), cacb beanng a phocolabUc 

and the like. proccctiag group (indicated by "Or) is exposed to the surface 

D. Binay Synthesis Strategy refers to an ordered strategy of the substrate and it reacts with regions that were 

for paraUcl synthesis of diverse polymer sequences by 5 addressed by Kght in the preceding step. The substrate is 

sequential addition of reagents which may be represented by then iUuminated through a second mask 46. which activates 

a rcActant m^*n'» and a switch matrix, the product of which another region for reaction with a second protected building 

is a product matrix. A reactut matnx is a Ixn matrix of the block "3". The pattern of masks used in these iUuminadoos 

building blocks to be added. The elements of the switch jjjg sequence of rcactants define the ultimate products 

matrix ire hiaary numbers. In prcfcacd embodiments, a locations, resulting in diverse sequences ai pre- 

hinary strategy is one in which at least two successive steps ^j^gj^cd locations, as shown with the sequences ACEG »nd 

fllnminaie h^lf of a region of interest on the substrate. In ^ ^^^^ pottiou of FIG. I. Prcfeired cmbodi- 

most prcfcned cmbodimenls. Knary synthesis rcfcn to a ^ inventioo take advantage of combinatorial 

synthesis strategy which also £a^ a masking strategies to form a Urge number of compounds in 

step. For example, a soategy which a swOdi ^ » , small numte of chemical 

^SSiboat hKp^ously proceied regions and spatial addr«sab^^^ the activator, m one case the Af- 

flSSig about half ofVcviouslyVotecied regions). It friction of light Each compound u physicaUy accessible 

wiU be recognized that binaiy rounds may be interspersed 20 and its position is precisely known. Hence, the array is 

with non-binary rounds and dut only a portion of a substrate spatiaDy-addressaWe and its interactions with o<hcr mol- 

may be subjected to a binary scheme, but will still be eculca can be assessed. ^ 

considered to be a binary masking scheme within the in a particular embodiment shown in FIG. 1, the substrate 

definition bcrdn. A binary •*masking^ strategy is a binary contains amino groups that are Uocked with a phccoUbile 

synthesis whidj uses light to remove protective groups from 23 protecting group. Amino add sequences arc made accessible 

mfw^'aig for addition of other such as amino adds. coupling to a receptor by removal of the phocoprocecttve 

In prtfcrred embodinxnts. selected columns of the switch groups. 

matrix are affanged in order of increasing binary numbers in Vthtn a polymer sequence to be synthesized is. for 

the columns of the switch matrix. example, a polypeptide, amino groups at the ends d linkers 

14. Linker refers to a molecule or groop of molecules ^ attached to a glass substrate are deiivatizcd with nitrovcra- 

atiached to a substrate and spacing a synthesized polymer tryloxycarbonyl (NVOQ, a pbotorcmovablc protecting 

from the substrate for exposure/binding to a rcccyxor. K^Vrr molecules may be, for cxaiple, aryl 

n. General acetylcoc, ethylene glycol oligomers coctaining from 2-10 

TTic present invention provides synthetic strategies and mocomcrs. diamines, diadds. minoad^^^^ 

devices fi^the creation of Urge scale dicznical diversity. thereof. Pbotodeprotection is effected by ilhumaation of the 

SoUd-pfalse diemistry. photoUbile protecting groups, and substrate throigh, for example, a mask whcmn the pattern 

photoUthograjAy are brought logctha to achieve light- has tran^wrcni regions with dimensions cf, for fx a mp le, 

(tirectedstuiiaUy-adAessabkparaUd chemical synthesis in less than 1 cm^. 10"* cm ^ 10"^ cm . 10" cm , Ijr* cm , 

J^alKdiments. IT' cm=, ICT* cm', 10"' cm'. KT* cm', or ir^° cm'. In 

The invcntioD is described hexdn for purposes of iUus- « a i«efarcd embodhncnt, the regions are between about 

tration primarily with regard to the preparation of pqjcides 10x10 )xm and 500x500 urn. According to sorne 

and nudeotides, but could readily be appUed in the prcpa- embodiments, the masks are arranged to produce a ctoeck- 

ration of other polymers. Sudi polymers indude. for erboard array of polymers, although any one of a variety of 

aimpic, both linear and cyclic polymers of nucleic adds, geometric configurations may he utilized. 

poWsacdiarides. phospholipids, and peptides having either 45 1. Example 

ci- or ti>ainino adds, hetmipolymcrs in which a known In one example of the invention, free amino groups were 

dr^ii covalcntW bound to any of the above, polyurethancs. fluoccscently labdlcd by treatment of *c^e^»^^^ 

polyesters, polycarbonates, polyuic*s. polyamidcs. surface with fluorescein isothioc>^ (FTTQ afte photo- 

polycthyleneimiDes. polyarylenc sulfides, polysiloxancs, deprotection. Glass imcrosoopc sUdes were cleaned, anu- 

Syimides, polyacetMes, or otha polymers whidi will be 50 nated by treatment with 0.1% aininopropyltriethoKyalitKin 

^B^Tu^ 95%cthanol,andinaibatedat llO^CforJOmuLT^ 

dzcd further, that ilhistratioos herein are primarily with aminated sutftce of the slide was then exposed to a 30 mM 

reference to C- to N-tetminal synthesis, but the invention solution of the N-hydroxysucdnimide ester of NVOv- 

could re^lily be applied to N- to C-tenninal synthesis GABA (nitrovaaaTlaxycatbonyl-t-amino butync add) m 

without depcting^miie scope of the invcntioa 55 DMF. TTic NVOC protecting group was photolytically 

A-Deprote^JTandAddidor^ removed by imaging the 365 urn output from a Hg ate lanq) 

The present inventioo uses a masked Kght source or other through a chrome on glass 100 pm chcckctboard nusk onto 

activator to direct the simultaneous synthesis of many dif. the substrate for 20 min at a power density of 12 mWan . 

foeot dicmical compounds. FIG. 1 is a flow diart iUnstrat- The exposed surface was then treated with 1 mM FTTC in 

ing the proccas of foamng chemical compounds according 60 DMF. The substrate surface was scaancd in " 

to one embodiment of the invention. Synthesis occurs on a fluorescence mioroscope (Zeiss Axioskop 20) uang 488 mn 

solid support Z A pattern of fflumination through a mask 4a exdtatioo from an argon ion User (Spc<lr»-Phyiics modd 

using a Urfit source 6 drtrmiinn which regions of the 2025). TTie fluorescence emission above 520 wn was 

wppoti are activated f<s chemical coqiEng. In one frefeired detected by a cooled photomultii^er (Hamamatsu 943-02) 

embodiment activation is accomplished by using light to 65 opcmcd in a photon counting mode. Fluorescence intensity 

remove photolahile protecting groups from selected areas of was translated into a color dispUy with red in the highest 

the subiate intensity and Wack in the lowest intensity areas. The pres- 
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cncc of a hi^-coctrast fluorescent chcckcrtjoard pattern of mean intcosity of sixteen YGGFL synthesis sites was 2.Q3x 

100x100 Jim elements rtvealcd thai trtc amino groups were lO' counts and the sundard deviation was 9.6x10' counts, 

generated in specific regions by spatiallylocalized photo- jj^ Synthesis 

dcprotcction. A. Reactor System 

2. EXAMPLE 3 pjQ 4 schematically illustrates a device used to synlhe- 

FIG. 2 is a flow chart illusirating another example of the 5X2e diverse polymer sequences on a substrate. The 

invention. Carboxy-activated NVOC-leudne was iu owed to substrate, the area of synthesis, and the area for synthesis of 

react with - an aminated substrate. The caiboxy activated cadi individual polymer could be of any size or shape. For 

HOBT ester of leucine and other amino adds used in this exazzi^e. squares* ellipsoids, rectangles, triangles, dndes. or 

synthesis was formed by mixifig 0.25 mmol of the NVOC lO portions thereof, along with irregular geometric shapes may 

amino protected amino acid with 37 mg HOBT be utilized. Duplicate synthesis areas may also be applied to 

(l-hydroxybenzotriazole). Ill mg BOP (benzotriazolyl-n- a single substrate for purposes of redundancy, 

oxy-tris (dimethylamino)-phosphoniumhcxa- Id one embodiment, the predefined regions on the sub- 

fluorophosphatc) and 86 pi DIEA(diisopropyleihylamine) in strate will have a surface area of between about 1 cm* and 

2.5 ml DMF. The NVOC protecting group was removed by 15 10"^**cm^ In some embodiments tbe regions have areas of 

uniform Ulumination. Carboxy-activated KVOC- less than about ir^ cm^ 10"* cmMO-^l^lO^ cmM 

phenylalanine was coqjlcd to the exposed amino groups for cm^. ICT* am', 10"^an^ 10^ cm*, 10 cm" or 10" cm . 

2 hours at room temperature, and then washed with DMF In a preferred embodiment, the regions arc between aljout 

and nkcthylene chloride. Two unmasked cycles of photo- 10x10 pm. 

deprotection and coupKng with carboxy-activated NVOC- ffl in some embodiments a single substrate supports m<Hr 

glycine were carried ouL The surface was then illuminated than about 10 different monomer sequences and pcrfcrably 

throu^ a chrome on glass 50 jil chcckaboard pattern mask. more than about 100 different monomer sequences, althou^ 

Carboxy-activated Na-tBOC-O-tButyl-L-tyrosiDe was then in some embodiments more than abcnt lO*. 10*, lO*. 10 , 

added. Tbe entire surface was uniformly illuminated to 10\ or 10* different sequences arc provided on a substrate, 

photolyzc the remaining NVOC groups. Hnally. carboxy- 25 Of course, within a region of the substrate in which a 

activated NV0C4^i)rQline was added, the NVOC group monomer sequence is synthesized, it is preferred that the 

was removed by illumination, and the t-BOC and t-butyl monomer sequence be substantially pure. In some 

protecting groups were ranoved with TTA. After reniovalrf cmbodimcnu, regions of the substrate contain polymer 

the protecting groups, the surface consisted of a 50 jnn sequences which are at least about 1%,5%, 10*, 15%.20%. 

checkerboard array of lyr-Gly-Gly-Phc-Uu (YQOFL) 30 25%. 30%, 35%, 40%, 45%, 50%, 60%, 70%, 80%. 90%, 

(Seq. ID No:l) and Pro-<Hy-Gly-Phe-Leu (PGGFLXSeq. ID 95%. 96%, 97%, 9S%. or 99% pure. The device includes an 

I^^Q^^ fnfftmatM prpriA* tyntfwCTTw 4dL The automalBd peptide 

B. Antibody Recognition synthesizer is a device which flows selected reagents 

In one preferred embodiment the substrate is used to . throu^ a flow cell 4t2 under the direction of a computer 

. determine which of a plurality of amino add sequences is 33 404. In a preferred embodiment the synthesizer is an ABI 

recognized by an antibody of interest Peptide Synthesizer, moda no. 431A.The computer may be 

1. EXAMPLE sclecled firom a wide variety of computers or discrete logic 

In one example, the array of pcnt4)eptide3 in the cxzinplc including for, examjHc, an IBM PC-AT or similar computer 

illustrated in FIG. 2 was probed with a mouse monoclonal linked with ^jpropriatc internal control systems in the 

antibody direaed against P-endorphin-This antibody (caUcd 40 peptide synthesizer. The PC is provided with s ignals from 

3E7) is known to bind YGGFL and YGGFM (Seq. ID the board computer indicative of. for rxamp le, the end of a 

No:21) with nanomolar affinity and is discussed in Meo et cot^ling cycle. 

aL, Prvc Natl Acad. Sd. USA (1983) 80:4084, which is Substrate 4$6 is mounted on the flow cell, fotming a 

inccrporated by reference herein for all purposes. This cavity between the substrate and the flow ceU. Selected 

antibody requires the amino terminal tyrosine for high 45 reagents flow through ttus cavity from the peptide synthe- 

affinity binding. The a^ay of peptides formed as described sizcr at selected times, forming an array of peptides on the 

in FIG. 2 was incubated with a 2 ^g/ml nwuse xrwnodooal face of the substrate in the cavity. Mounted above the 

antibody (BET) known to recognize Y(jOTL 3E7 does not substrate, and preferably in contact with the substrate is a 

bind PGGFU A second incubation with fluoresceinattd goat mask 408. Mask 4#8 is transparent in selected regions to a 

anti-mouse antibody labeled tbe regions that bound 3E7. The 30 selected wavelength of light and is op^jue in other regions 

surface was scanned with an epi-flucrtsccnce microscope. to the selected wavelength of light. Tbe mask is iUmninated 

The results showed alternating bright and dark 50 jmi with a light source 41# such as a UV light soiroe. In one 

squares indicating that YGGFL and PGGFL were synthe- specific embodiment the light source 410 is a model no. 

sized in geometric array determined by the mask. A high 82420 made by OricL Tbe mask is held and translated by an 

contrast (>12;1 intensity ratio) fltiorescencc checkerboard 35 x-y-z translation stage 412 such as an x-y transitu stage 

image shows that (a) YGGFL ai>d PGGFL were synthesized made by Newport Corp. The computer coordinates action of 

in *|f^«ti> 50 fmi squares, (b) YGC3=L attached to the the p^ddc synthesizer, x-y translation stage, apd light 

surface is accessible for binding to antibody 3E7. and (c) source. Of course, the invention may be med in some 

antibody 3E7 does not bind to PGOT-. embodiments with translation of the substrate instead of the 

A three-dimensional representation of the fluoresoeooe 60 mask, 

intensity dau in a portion of the checkboard is shown in FIG. In operation, the substrate is mounted on the reactor 

3. This figure shows that the border between synthesis sites cavity. The slide, with its surface protected by a suitable 

is sharp. The height (rf each spike in this display is linearly photo removable protective group, is exposed to li^t at 

proportional to the integrated fluorescence intensity in a 2J selected locations by positioning the tnask and iUnminating 
^m pUeL The transition between P(3GFL and YGGFL 65 the light source for a desired period of time (such as, for 

occurs within two spikes (5 ^un). There is little variation in example, 1 sec to 60 min in the case of peptide synthesis), 
the fluorescence intensity of diflfcrcnt YCKIFL squares. The A selected peptide or other monomer/polymer is punqjed 
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through the reactor cavity by the peptide synthesizer for in a syvsbcsis rtgion. A substrate focmed with mixtures of 

H'rvting It the selected locations oo the substrate. After a compounds in vsnous synthesis rcgioDs may be used to 

sfl^KfiH^ reaction time (such as about 1 sec to 300 min in the pctfonn, for cxaii9)le. an initial screening of a large number 

case of peptide rcactioos) of the tDonoma is washed from <^ oon^xxinds. after which a smaUer number of ccmpooods 

the system, the mask is appropriately repositioned or 3 regions which exhibit high binding affinity arc further 

repU«d. and the cycle is rrpcated- In most embodiments of Similar rcsi^ j^y be obtained by ooJb^ P««i^y 

mibvention,n^'oQSiiia/bcaD^^ photylmng a repon. addmg a first monomer, rt^otylmng 

, ' . the same region, and cxposmg the region to a second 

FK5S. 5a and 56 are flow charts of the softw^e used in 

operation of the reactor sy«an. At step 502 the peptide lO ^ ^ Ugbt-dircctcd chemical synthesis, the products 

synthesis software is mitiahzed At stq> 504 the system fOTncddepcndonthcpaticmandordcrof masks, aiKl on the 

calibrates positioners on the x-y translation stage and bcgms ^ rcactants. To make a set of products there wiU in 

a main loop. At step 506 the system determines which, if general be "n" poasiWc masking schemes. In prcfcned 

any, ofthcfunctiookEysoo the computer have been pressed embodiments of the invention herein a binary synthesis 

ffFl has been pressed, the system prompts the user for input \5 jtratcgy-is utilized The binary synthesis strategy is illus- 

ot a desired synthesis process. If the usa enters F2. the trated herein primarily with regard to a masking strat^, 

system allorws a user to edit a file for a synthesis process at although it will be applicable to other polymer synthesis 

step 510. If the user entcn F3 the system loads a process strategics such as the pin strategy, and the Kkc. 

from a disk at stq) 512. If the user enters F4 the system saves ^ binary synthesis strategy, the subatralc is irradiated 

an entered or edited process to disk at step 514. If the user :o ^ mask. e:3q>oaedto a first building block, izradiated 

selects FS the current process is displayed at step 51< while ^ ^ scoowi mask, exposed to a second building block, etc 

selection of F6 staits the main portion of the program. Le.. p^r^i combination of masked irradiation and exposure to a 

the actual synthesis according to the selected process. If the building block is lefctred to haein as a -cyde." 

user selecu F7 the system displays the location ctf the ^Mcfmed binary masking schaz>e, the masks for each 

synthcaxod peptides, while pressing Fit returns the user to 25 cycle allow irradiation of half of a regioo of interest on the 

the disk cpenting system substrate and prctectioa of the remaining half of the region 

FIG. Sb illastrates the synthesis step 518 in greater detail ij^gpft By *1uir it is intended hereia not to i p^p 

The main loop of the p rogr am is started In wtiich the system exactly ooe-half the region of ioterest, but i'n< t^^H a large 

first moves the mask to a ncn position at step 526. During fractionof the region of interest such as from about 30 to 70 

the main loop of the program, necessary che mi cals flow 30 percent of the region of interest It will be understood that 

through the reaction cell unda the direction of the on-board ^ m^tiring schen^ need itoc take a binary fomu 

oofflputer in the peptide syntbesizex. At step 528 the system . instead non-binary cycles may be introduced as desired 

then waits for an exposure command and. upon reoeipc of the between binary cycles. 

exposure command exposes the substrate for a desired time . Jq pt e feu e d embodiments of the binary ™<Wng scfaeme> 

at step 530. When an acknowledge of exposure coa^lete is 33 a given cyde illuminates only about half of the region which 

received at step 532 the system detamines if the process is fliuminatcd in a previous cyde^ whfle protecting the 

complete at step 534 and. if so. waits for additional keyboard remaining half of the «ni«tTtwn«»i^ portion from the previous 

iiqxit at step 536 and, thereafter, exits the perfonn synthesis cyde. Convcndy. in sudi prefened embodiments, a given 

process. cyde illuminates half of the region whicfa was protected in 

A computer program used for operation oC the system 40 previous cyde and protects half the region which was 

described above is indoded as microfiche AppeiKiix A protected in a previous cyde. 

(Copyright, 1990, Affymax Technologies N.V., all rights jbc synthesis strategy is most readily illustrated and 

reserved). The program is written in Turbo C++ (Boriand handled in matrix notation. At r*^ synthesis aite, the 

Int*l) and has been impJonmrrd in an IBM compatible (ieteimination of whether to add a given monomer is a binary 

system. The motor control software is adajxed from software 43 pnxKSS. Thaefore, each product element is given by the 

produced by Newport Cotpocation. It wiU be recognized that ^ product of two vectors, a chemical reactant vector, eg., 

a large variety of programming languages could be utilized Cm[a3,CJ)). and a binary vector Oj, Inspection of the 

without deputing from the scope of the invention herein. products in the example bdow for a four-step systbesis. 

Certain calls are made to a gr^cs program in *Ttogram- ^hat in one four-step synthesis o/^UOXOh a^lJ>, 

ma Guide to PC and PS2 Video Systems" (Wilton, 30 q ay=l0.1.1.0], and 04=10.1,0,11. where a 1 indicates 

Microsoft ftess, 1987). whidi is incoqwratcd herein by iHoimnation and a 0 indicates procectioa. Therefore, it 

reference for aD purposes. becomes possible to build a ^switch matrix** S from the 

Alignment of the muk is achieved by one of two methods cotmnn vectors o. 0= 1 where k is the number cf products), 
in pa c f cued embodiments. In a first embodiment tbe system 

relics upon relative align wvwt of the various oonq)onests, 35 oi os as ct 

wfaidi is normally acceptable since x-y-z translation stages ^ 
arec^bleofsuffident accuracy for the purposes herein. In 
alternative ^^r&itMwnt^t*^ alignment marks on tbe substrate 
ve coupled to a CCD device for tpprofptuxc alignment 

According to some embodiments, pure reagents are not 60 
added at each stq). or complete photolysis of the protective 

groups is not provided at each step. According to these The outcome P of a synthesis is singly I^CS< the product 

embodiments, multiple products will be formed in each of the chemical reactant matrix and the switch matrix, 

synthesis site. For example, if the monooiers A and B arc The switdi matrix for an ih^cle synthesb yielding k 

mixed' during a synthesis step. A and B will bind to depro- 65 products has n rows and k columns. An im por ta nt attribute 

tected regions, roughly in proportion to their concentration of S is that eadi row specifies a mask. A two-dimensional 

in solution. Hence, a mixture of oonqxxinds will be formed mask zii^ for the jth chemical step of a synthesis is obtainrid 
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diiTCtly from the jth low of S by pUdng the clezncnts . locatioas oo the substrate ait simply defined by the colamiu 

. . Sjjt into, for example, a sqam fonnat The paiticular of the switch matrix (the first column indicating, for 

znaogement below provides a square format, although lin- example, that the produaABCD will be present in the upper 

car or otha anangcmcats may be utilized left-hand locaiioo of the substrate). FurthcaiKHc, if only 

3 selected desired products are to bc-made. the mask sequence 

'u fn ID *u can be derived by extracting the columns with the desired 

9n IB rtA ^JTfj : sequences. FcH- example, to form the produa set ABCD. 

^"iji m m ri*'^^ifl /> ^' 

- f<2mcd by use of a switch matrix with only the Isc 3rd, 5th, 

X4I *o '« . 10 7th. 9ih. 11th, Dth. and 15th cohnnns arranged into the 

switch matrix: 

Of oounc, compounds formed in a light-activated syn- 
thesis can be positioned in any defined geometdc aziay. A i i i i o o o o 
square or rectangular matrix is convenient but not requ^ncd. i i o o i i o o 
The rows of the switch matrix may be transformed into any i o i o i o i o 
ooDvenient atray as long as equivalent transformations arc 

used for each row. i i 1 i I i I i 

For example, the masks in the four-stq> synthesis below 
are then denoted by: To form all of the polymers of length 4, the rcactant matrix 

[ABCDABCDABCDABCDJis used. The switch matrix win 
I 0 0 0 0 I ^ be formed from a inaicix(rf the binary nuinbers from 0 to 2** 

"'^o ©"^"i o"*^o 1 anangcd in columns. The columns having four monomers 

are than selected and arranged into a switch matrix, 
where 1 denotes iDumination (activation) and 0 denotes no Therefore, it is seen that the binary switch matrix in general 
illumination. ptovide a r^rcsentation of aU the products which can 

The TnatTiT representation is used to generate a desired set be made from an i^stcp S3^thesis, from which the desired 
of products and product maps in prcfenod cT nb ' ^^"*^"^* products are then extracted. 

Each oorqxmd is defined by the produa of the chemical The rows of the binary switch mrtrix will, in preferred 
vector and a particular switch vector. Therefore, for each embodiments, have the property that each masking step 
synthesis address, one sinqily saves the switch vector. Illuminates half of Che synthesis area. Eadi TTv»tVing step 
assembles all of them into a switch matrix, and extracts eadi ^ also factors the preceding masking step; that is. half of the 
of the rows to form tt>c masks. . . region that was illuminated in the preceding step u again 

In some cases, particular prodzxt distributions or a maxi- , illuminated, whereas the other half is noL Half of the region 
mal number of products are desired.' For example, for that was unilluminated in the preceding step is also 
C=^ A3.CD], any switch vector (Oy) consists of four bits. - illuminated, whereas the other half is noc Thus, masking is 
• Sixteen four-bit vecton cxisL Hence, a maiimiim of 16 recursive. The masks are constructed, as described 
different products can be made by sequential addition of the previously, by extracting the elements of ri t r fa row and 
reagents [A3.CJD]. These 16 column vecton can be placing them in a square array. For exanq}Ie, the four masks 
assembled in 16! different ways to form a switch matrix. The in S for a four-step synthesis are: 
order of the column vectors defines the ™«<""g patterns, 

and. therefore, the spatial ordering of products but not their ^ i i i i i i i i 

makeq>. One ordering of these columns gives the following i i i i o o o o 

switch matrix (in which **null'' (6) additions are induded in "'"oooo'^^iiii 
brackets for the sake of completeness, although such null oooo oooo 

additions are elsewhere ignored hocin): 

43 110 0 10 10 

110 0 10 10 
1 I 0 0 1 0 I 0 
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The recursive factoring <3i masks allows the products of a 
lightrdirccted synthesis to be represented by a polynomiat 
(Some light activated syntheses can only be denoted by 
ixredudble, ix.. prime polynomials.) For example, the poly- 
nomial cocrcspooding to the top synthesis of FIG. 9b 
(discussed below) is 

P^A+BXC +D) 

The columns of S according to this wspca. of the invention 

are the binary representations of the numbers 15 to 0. The A reaction polynomial nuy be eaqunded as though it were 
sixteen products of this binary synthesis are ABCD. ABC. 60 an algebraic expression, provided that the order cfjoimng of 
ABD. AB. ACD. AC AD. A. BC3), BC, BD, B, CD, C, D, reactanls and is preserved (XjXj ^XjXi), ie., the 
and 6 (null). Also note that each of the switch vectors from products are oot commutative. The product then is AC+AD-f 
the four-step synthesis masks above (and hence the synthesis BC+BD. The polynomial explicitly specifies the reactuts 
products) are present in the four bit binary switch matrix. and inqilicxtly specifies the xxuskfor each step. Each pair of 
(See columns 6, 7, 10. and 11) 63 parentheses demarcates a round of synthesis. The 

This synthesis procedure provides an easy way for map- rcactaats of a round (eg.. A and B) react at nonovaiapping 
ping the completed products. The products in the various sites and hence cannot combine with one other. The syntbe- 
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sis area is dividod equally uooogst the elements of a round 
(cg^ A is directed to one -half of the area aod B to the ctha 
half). Heace. the uuLsks for a round (e.g^ Che masks m^and 
mB) arc ortbogotial and form an csthononnal set. The 
potynomial ooCatioa also signifies that each element in a 
round is to be joioed to each element of the next round (e.g., 
A with C, A with D. B with C and B with D). This is 
acooc^shed by having mc oveii^ an equally, and 
likewise for m^ Because C and D are elements of a round, 
mc and m^, are orthogonal to each other and form an 
ortioocnxul set 

The polynomial representation of the binaiy synthesis 
desoibed above, in which 16 products arc made from 4 
reactants, is 

which gives ABCD. ABC. ABD, AB. ACD, AC, AD. A 
BCD. EC BD. B, CD, C, D. and • when expanded (with the 
nile that eX>X and X^X. and remembering that joining is 
ordered). In a binary synthesis, each round contains one 
reacunt and one null (denoted by 6). Half of the synthesis 
vea receives the reactant and the other half receives nothing. 
Each mask ovalaps every other mask equally. 

Binaty rounds and aoo-fainary rounds can be inter^xrsed 
as desired, as in 

P-<A+«XBXC+0+«XB+M) 

Hie 18 compounds fom^ are ABC& ABCF. ABCG. 
ABDR ABDE ABDG, ABE. ABF, ABG. BCE, BCT, BCG. 
BDE. BDR BEXj, BE, BF, and BG. The switch matdx S for 
this 7-step synthesis is 

lllllllIlCOOOOOOOO 

I 1 1 I 1 1 I I 1 1 1 1 1 I I I 1 I 

1 1 10000001 1 10000 0 0 
5^0 001 1 10000001 1 1000 
100100 lOOlOOlOOlOO 
OlOOlOOlOOlOOlOOlO 
OOlOOlOOlOOlOOlOOl 

The round denoted by (B) places B in all products because 
the reaction area was unifocTnly activated (the mask for B 
coniistcd entirdy of I's). 

The nimibcr of compounds k formed in a synthesis 
consisting of r rounds, in which the ith round has b| chemical 
reactants and z« ouUs. is 

and the p utnhw of chemical steps n is 

The fp»*w>^ of o o mp onads synthesized when b=^ and ^=0 in 
all rounds is a'^, coirq>cred with 2" for a binaiy synthesis. 
For i»«20 and a*5. 625 compounds (all tetrameros) would be 
formed, compared with 1.049x10^ compounds in a binary 
synthesis wiA the same number of rhrmicil ste^. 

It should also be noted that rounds in a polynomial can be 
nested, as in 

<A4<MXC4«XM) 

The products are AD, BCD. BD, CD, D, A, BC B, C, and 
6. 

Binary syntheses are attractive for two reasons. First they 
generate the muTimji oumbcr of products (2'^ for a given 
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numbo- of chemical steps (n). For four reactants. 16 com- 
pounds are formed in the binary synthesis, whereas only 4 
are mjiHi- when each round has two reactants. A 10-step 
binary synthesis yields 1.024 con:q>ounds. and a 20-$tep 

5 synthesis yields 1.048J76. Second, products formed in a 
binary synthesis are a complete nested set with lengths 
ranging from 0 to n. All compounds that can be formed by 
rf^Vrtng <Hie or nxxr units from the longest product (the 
n-mer) are present Contained within the binaiy set are the 
smaller sets that would be farmed from the same reactants 

^° using any other set of masks (c.g.. AC AD, BC, and BD 
formed in the synthesis shown in FIG. 6 are present in the 
set of 16 formed by die binary synthesis). In some cases, 
however, the experimeotaUy achievable spatial resolutioa 
may iK>t suffice to accomizkodate all the compounds formed. 

^ Therefore, practical limitatiocis may require one to select a 
paxticuUr subset of the possible switch vectors for a given 
synthesis. 
I. EXAMPLE 

FIG. 6 iUustrates a synthesis with binary masking schen>e. 

20 The binaiy masking scheme provides the greatest number of 
sexpiences for a given number of cycles. According to this 
embodtn^t, a mask ml allows illumination of half of the 
substrate. The substrate is then exposed to the buflding block 
A, which binds at the illuminated regioos. 

25 Thcreafter.themaskmZallowsilluminationof half of the 
previously py«www«f^ region, while protecting half of the 
previously illaminated region. The building hlockB is then 
added, which binds at the illuminated regions from mZ. 
The process continues with masks ix>3, m4, and id5, 

30 resulting in the product array shown in the bottom portion of 
the figure. The process generates 32 (2 raised to the power 
of the Bui ubei' of monomers) sequences with 5 (the number 
of monomen) cydes. 
Z EXAMPLE 

35 FIG. 7 illustrates another prefecred binary maskin g 
scheme v^cfa is referred to herein as the gray code masking 
scbcn^. Aoccrding to this embodiment, the masks ml to taS 
are selected such that a side of any given synthesis region is 
defined by the edge of only one mask. The site at which the 

40 sequence BCDE is formed, for ctample, has its tight edge 
rf/>firwH by mS and its left side fom^ by mask m4 (and no 
other p^'tk is aligned on the sides of this site). Accordingly, 
ps'oblems created by misalignment, difiusioo of light under 
the mask and the Ukc will be ni"*™^^ 

45 3. EXAMPLE 

FIG. 8 illustrates another binary masking scheme. 
According to scheme, refrzred to herein as a saodified 
gray code masking scheme, the number ctf masks needed is 
minimized. For example, the nusk m2 could be the same 

so mask as ml and singly translated laterally. Similarly, the 
mask m4 could be the same as mask m3 and sin^y 
translated laterally. 
4. EXAMPLE 

A f our-stq) synthesis is shown in FIG. 9a. The reactants 
55 are the ordered set { A3>CJD}. In the fint cyde, fHiimination 
through m^ activates the upper half of the synthesis area. 
Building block A is then added to give the distribution <t2^ 
Illumination through mask mj (which activates the lowo' 
half), followed by additioa of B yields the next intomediate 
60 distdbutioG 604. C is added after illumination through 
(which activates die left half) giving the distribution 6#4, 
and D after illumination through 104 (which activates the 
right half), to yield the final product pattern 6#8 (ACAD, 
BCBD). 
65 5. EXAMPLE 

The above masking strategy for the synthesis may be 
extended for all 4(X) dipqjcides frtnn the 20 naturally occur- 
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ring amino adds as shown in FIG. 96. The synthesis consists of the controls needed to assess the fidelity of a synthesis, 

of two rounds, wiih 20 photolysis and chemical ooupLing For example, the fluorescence signal from a synthesis area 

cycles pet round. In the first cycle of round 1. mask I nomtnally containing a tctrapcpdde ABCD could come from 

activates V2oCh of the substrate foe coupling with the first of a tr^xptide deletion imparity such as ACD. Such an artifact 

20 amino acids. Nineteen subsequent fllumination/coupUng 3 would be ruled out by the finding that the fiuoresoetice 

cycles in round 1 yield a substrate consisting of 20 lectan- intensity of the ACD-sile is less than that of the ABCD site, 

giilar stripes each bearing a distina member of the 20 amino The fifteen most highly labelled peptides in the array 

acids. The rnasks of rouiKi 2 arc perpendicular to round 1 obtained with the synthesis of 1.024 peptides described 

masks and therefore a single illumination/coupling cydc in above, were YGAFLS (SEQ. ID No:5). YGAFS (SEQ. ID 

round 2 yields 20 dipcplidcs. The 20 illuminatioEycoupling lO No:6), YGAFL (SEQ. ID No:7), YGGFLS (SEQ. ID No:8), 

cydes of round 2 complete the synthesis of the 400 dipcp- YGAF (SEQ. ID No;8X YGALS (SEQ. ID No:9). YGGFS 

tides. (SEQ. ID No:l0). YGAL(SEQ. ID No:ll). YGAFIP(SEQ. 

6. EXAMFLE ID No:12). YGAF (SEQ. ID No:0). YGAFF (SEQ. ID 

The power of the binary masking strategy can be apprc- No:l4), YGGLS (^Q. ID No:15). YCjGFL (SEQ. ID 

dated by the outcome of a 10-step synthesis that produced 13 No: 16), SEQ. ID No: 17), and YGAFLSF (SEQ. I fifteen 

1.024 peptides. The polynonial expression for this 10-stcp begin with YG, whidi agrees with previous work showing 

binary synthesis was: that an amino-tenninal tyrosine is a key determinant of 

biihdins. Residue 3 of this set is either A or G, and residue 

(f*exY^XC^XA4flXGtfl)cr^) CF^XLt«XS+«)(M) ^ ^ F or U The cxdnaion of S and T from these 

Each peptide occupied a 400x400 ism square. A 32x32 20 positions is clear cut The finding thai the preferred sequence 

peptide anay (1,024 peptides, induding the null peptide and is YG(A/G) (F/L) fits nicely with the outcome of a study in 

10 peptides of 1=1. and a limited number of duplicates) was which a very large library of peptides on phage generated by 

dearly evident in a fluorescence scan following side group recombinant DKA methods was screened for binding to 

deiBOtection and treatment with the antibody 3E7 and fluo- antibody 3E7 (see Cwiila et aL, Proc, Nad, Acad Set, USA, 

resonated antibody. Each synthesis site was a 400x400 pm 25 (1990) 87.-6378, incorporated herein by reference). Addi- 

sqa^t. tioiai binary syntheses based on leads from peptides on 

The scan showed a range of fluorescence intensities, from phage otpcrimcnU show that YGAFMQ (SEQ. ID No: 18), 

a background value <rf 3 JOO counts to 22,400 counts in the YGAFM (SEQ. ID No: 19), and YGAPQ (SEQ. ID No:20) 

brightest square (x«20, y^. Only 15 confounds exhibited give stronger fiuorcsccDCc signals than docs YGGFM, the 

an intensity greater than 12300 counts. The median value of 30 immunogen used to obtain antibody 3E7. 
the array was 4,800 counts. Variations on the above masking strategy will be valuable 

The identity of each peptide in the anay could be deter- in certain circomstances. For e xamp le, if a **kEiner 

mined from its X and ycoordinaXBS (each range from 0 to 3 1) ■ sequence of interest consists of PQR separated from XYZ 

and the map of FIG. 10. The chemical units at positions 2, . and that the aim is to synthesize pqitidcs in which these 

3. 6. 9. and 10 arc specified by they coordinate and those ax 3S units are separated by a variable number of different 

positions 1, 3, 4. 7, 8 by the x coordinate. All but one of the residues, then the kemd can be placed in each peptide by 

peptides was shorter than 10 residues. For exan^le. the using a mask that has Ts everywhere. The pofynomial 

peptide at x=12 and y=3 is YGAGF (SEQ. ID No3) representation of a suitable synthesis is: 

(positions 1, 6, 8. 9. and 10 arc nulls). YGAFLS (SEQ. ID /wovnYA^YB+avc^OYMV^rYVZ^ 
No:4),thebrightestelcmcntoftbeairay,isatx=20andy=9. 40 (PXQXHXA^XB4exC4«xi>+«XX)CYXZ) 

It is often desirable to deduce a binding affinity of a given Sixteen peptides will be fonxied, ranging in length from the 

peptide from the measured fluorescence intensity. PQRXYZ to the lO-mff PQRABCDXYZ. 

Concq}Cually. the sin^lest case is one in which a singk Sevml ma^Wng strategies will also find value in 

peptide binds to a univalent antibody molecule. The fiuo- selected drcumstanoes. By ««ng a particular mask more 

rcscence scan is carried cwt after the slide is washed with 43 Q^jg j^ope reactants will appear in the same set 

buffer for a defined time. The order of fluorescence intcn- products. Fcr exan^e, suppose that the mask for an 

sides is then a txvasure primarily of the relative dissociation g-step syxstbesis is 
rates of the antibody-peptide complexes. If the on-rare 

constants are the same (e.g., if they are diffusion-controlled), ^___^^^_^^^^——<^ 

the order of fluorescence intcxuitics.will coai esp ond to the so a uiiooqo 

order of binding affinities. However, the situation is some- c uooiioo 

times more complex because a bivalent primary antibody oouoou 

and a bivalent secondary antibody arc used. The density of e loioioio 

peptides in a synthesis area cacre^>ooded to a mean sepa- F oioioioi 

ration of -7 nm. which would aflow mnltivalent antibody- 33 ^ ^Saomx 

peptide interactions. Hence, fluorescence intensities • 

obtained according to the method herein will often be a 

qualitative indicator of binding afl&nity. The products arc AC£G, ACTG, ADEG. ADFG, BCEH. 

Another important consideration is the fidelity of synthe- BOFH, BDEH. and BDFH. A and G always appear together 

-sis. Deletions are produced by incomplete photodeprocection 60 because their additions were directed by the same mask, and 

or incoicplete coupling. The coupling yield per cycle in Ukewise for B and H. 

these experiments is typically between 85% and 95%. C. linker Selection 

In^lementing the switch matrix by masking is impafect According to preferred embodiments the linker molecules 

because of light diffraction, intonal reflection, and scatter- used as an intermediary between the synthesized polymers 

ing. (Consequently, stowaways (chemical units that should 65 and the substrate are selec t ed for optimum length ai>d/or 

not be on board) arise by unintended LUumination of regions type for in^)roved binding interaction with a recepcoc 

that should be dark. A binary synthesis array contains many According to this aspect of the invention diverse linkers of 
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vnying length »nd/<ff type arc synthesized for subsequent carboxyl group of an amino acid, and the namre of the 

■nachmcniof aligaad-ThroGghviriaiioasinthelengthand chemical synthesis will dictate which reactive group will 

type of linker, il becomes posablc to oodmize the binding require a protecting group. Analogously. jttachmcDt of a 

imoaction between an immobilized ligand and its leccptoc protcoing group to the 5'^ydraxyl£oup of a nucieonde 

TUc degree of binding between a ligand (peptide, 5 during synthesis using for^an^^ 

inhibitor^cn, drug, L) and its reenter (laiyine. P^g ^ ^^^'^'^JSl "iTl' "^ T f 

Sdy! «^wben o^; of the partners ^Zn^ybnizai on 3'-aclivated phosphate-tnester of 

to a substrate wfll in «'°^.°^>»>;^ ^Reg^ess of the specific use, protecting groups are 

«xessibUity of the n>cepior in sohmon to the en^Syed to protect a ici^ on a liolecule from reacting 

hgaod. maccessibUiiym mm win depend on Ae length lO ;;iS anotijerrcagenLftotecting groups of the present inven- 

and/or type of linker molecule cn?)loycd to mmiobihze one foUowing charactoistics: they prevent selected 

d the partoers. Ptefcxicd embodiments of the invention reagents from modifying the group to which they arc 

therefcffc en^iloy the ULSIPS™ technology dcsaibed toached; they are stable (that is, they remain attached to the 

herein to generate an array of. preferably, inactive or inert in<decuk) to the synthesis reaction conditions; they are 

linkers of varying length and/or type« using photochemical 15 reniovable under conditions that do not adversely affect the 

jrotecting groups to selectively expose different regions cf remaining structure; and once removed do net react appre- 

the substrate and. to build vpon chemically-active groups. . cxably with the surface or sutfacc-bound oligomer. The 

In the simplest embodiment of this concept^ the same unit selection of a suitable protecting group will 6epend^ of 

is tftached to ttic substrate in varying miittiples (b* lengths in course, oo the chemical nature of the monomer unit and 

known locations on the substrate viA VLSIPS™ techniques 20 oligomer, as well as the specific reagents they ve to protect 

to generate an array of polymen of varying length. A single against 

ligand (peptide, drug, hapten, etc) is attached to each of Jq 4 pcefared embodiment, the protecting groups are 

rtv-m utd an assay is performed with the binding site to photo*ctivatable. The properties and uses of pbotcreactive 

. evaluate the degree of binding with a receptor that is known protecting cooipounds have been reviewed. See, McCray et 

to bind to the ligand. In cases where the linker length 75 /^^v. of Biophys. and Biophys, Chem, (1989) 

impacts the abiUty of the receptor to bind to the ligand 18:23^270. which is incorporated herein by reference, 

varying levels of binding win be observed. In genffal. the Re f erably, the photosensitive protecting groups will be 

iiwVi-r which provides the highest binding will then be used .rcmovabie by radiation in the ultraviolet (UV) or visible 

to assay other ligands synthesized in accordanoe with the . poction of the electromagnetic spectrum. More preferably, 

techniques herein. the protecting groups will be remor^able by radiatioo is the 

According to other embodiments the binding between a Qg visible portion of the spectrum. In some 

single ligand/recepto- pair is evaluated for linkers of diverse embodimenu, however, activation may be pofoxmed by 

moQonxr sequence. According to these cmbodonents. die ■ other methods such as localized; heating, electron beam 

linkers arc synthesized in an array in accordance with the Utbography. laser pumping, ootidatioo or rednctioG with 

techniques herein and have different monomer sequence 35 niiczodectrodes: and the like. Sulfonyl compounds arc suit- 

(and, optionally, diff aent lengths). Thereafter, all of the • utile reactive groups for elcctroa beam lithography. Ozida- 

linker molecules are provided with a ligand known to have tive or reductive rexiK)val is aocomplisbed by exposure of the 

at least some binding affinity for a given receptor. The given protecting group to an electric current source, preferably 

receptor is then exposed to the ligand and binding affinity is using microdectrodes directed to the predefined regions of 

deduced. linker molecules which provide adequate binding 40 surface which are desired for activatioa. Other methods 

between the ligand and recqitor are then utiliTrd in screen- yrmy be used in light of this disclosure, 

ing smdics. Many, although not all, of the pbotoremovable protecting 

D. Protecting Groups groups wHl be aromatic compounds that absorb near-UV and 

As discustcd above, selectively removable protecting visible radiation. Suitable photoremovable protecting 

groups allow creation of well defined areas of substrate 45 groups arc described in, for exano^le, McCray et mL, 

surface having differing reactivities. Ft^erahly, the protect- Patchormk, J, Amen Chm, Sec, (1970) 92 :6333. and Anoit 

ing groups are selectively removed from the surfiacc by ct aL, 7. Org. Chem. (1974) 39:192, which are incorporated 

applying a specific activator, such as electromagnetic radU- herein by reference. 

tion of a specific wavelength and intensity. More preferably, preferred dass of photoremovable protecting groops 

the specific activator oiposes lelccTrd areas of surface to so ^as the general formula: 
remove the protecting groups in the exposed areas. 

I^Dtecting groups of the present invention are used in 
conjunction with loUd phase oligomer syntheses, such as 
p»p^yji> syntheses using natural or unnatural amino acids, 

nucleotide syntheses using deoxyxibonudeic and xibo- 55 
nucleic acids, oligos«xfaaride symbeses, and the like. In 
addition to protecting the substrate surface from unwanted 
reaction, the protecting groups block a reactive end of the 
monomer to prevent self-pdymerizatioa For instance, 

attaduTKHt of a protecting groi^) to the amino terminus of an €0 where R\ R\ R\ and R* independently are a hydrogen 

activated amino a d*j, such as an N-hydroxysuocinimide- atom, a lower alkyl, aryl. benzyl, halogen, hydroxyl, 

activated estff of the amino acil prcvenu the amino termi- alkoxyl. thiol, thioctber. amino, nitro, carboxyL formate, 

nus of one monomer ftom reacting with the activated ester focmamido or lAosphido group, or adjacent substitocnts 

portion of another during pqjtide synthesis. Alternatively, (Le., R*-R^ R^-R*, R'-R^ are substituted oxygen groiqw 

the protecting group nuy be attached to the carboxyl group 65 that together form an cyclic acetal or ketaUR isahydrogen 

of an amino add to prevent reaction at this site. Most atom, a alkoxyl alkyL hydrogen, halo, aryi or alkenyl 

protecting groups can be a ^*^'****^ to dther the amino cr the group, and o=0 or 1. 




I. 
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A prtf cned protecting group, 6-nitroverairyl (NV). which 
is used for procectLBg the csibofxyl tcnrriniis cf an ammo add 
or the faydroxyl groop of a nucleotide, for examine, is 
farmed when and are each a mcthoxygroup, R*, R* 
and R^ are each a bydrogCD atom, and oM): 




OMe 



O&Ce 



A preferred protecting group, 6-mtrovcratrylaxycarbonyl 
(NVOQ. which is used to protect the amino tfrminns of an 
annuo add, for example, is fanned when R^ and R^ arc each 
a mcthooty group, R^, R^ and R' arc each a hydrogen atom, 
and B=3l: 



o 




OMe 



OMe 



Another prefored protecting group, 6-aitropiperonyl 
(NP), wtiich is used for protecting the carboxyl terminus of 
an amino add or the hydraxyl groop of a nucleotide, for 
example, is formed when R^ and R^ together form a meth- 
ylene acetal. R\ R'* and R^ are each a hydrogen atom, and 
n=0: 



NOx 




Another preferred protecting group. 
6-mtropipcronylo3tycarbonyl (NPCXT). which is used to pro- 
tect the amino terminus of an amino add. for rn ample, is 
formed when R^ and R^ together fonn a methylene acctai 
R', R^ and R^ are each a hydrogen atomu and o=l: 




A most preferred protecting group. me(hyl-6-nxcroveratryl 
(MeNV). which is used for protecting the csrtxTxyl (cnninns 
of an amiao add or the hydroocyl group of a nucleotide, for 
eitample. is formed when R^ and R^ are each a methoxy 
group, R^ and R^ are each a hy<frogen atom. R' is a methyl 
group, and is»0: 



22 




NO, 



OMe 



OMe 
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Another most preferred protecting group, methyl-6- 
nitrovcratryloxycarbonyl (MeNVCXT), which is used to pro- 
. tect the amino tenninus ctf an amino add. for example, is 
fanned when R^ and V? arc each a methoxy groi^. R^ and 
R* are each a hydrogen atom. R* is a methyl group, and d=1: 



o 



Me 



NOi 




O 



O^ 
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Another most preferred protecting group, n;Kthyl-6- 
nitrop^>eroayl (MeNF), which is used for protecting the 
carboxyl tenninus of an amino add or the hy<ht>xyl grotq> of 
a nudeoddc, for rranyle, is farmed when R^ and R' 
together form a methylene acetaL R^ and R* are each a 
hydrogen atom, R' is a methyl groi4>. and i»=0: 



Ma 



NOi 



35 




Another most preferred protecting group, methyl-6- 
ni t r o p ip eronyloocycarbonyl (MeiNPOQ, whid} is used to 
45 protect the ainioo tenziinus of an amino add, for cxAznple* is 
f ocned when R^ and R^ together form a methylene acetal, 
R^ and R'* are each a hydrogen atom* R^ is a methyl group, 
and v=\\ 



50 




55 



^ A protected amino add having a i^MCoactivatable oxy- 
caifoooyl protecting group, sudi NVOC or NPOC or their 
cocrespooding methyl derivatives, M^fVOC or MeNPOC. 
respectivdy, on the amino terminus is formed by acylating 
the amine <tf the amino add with an activated oxycaiboiiyl 

63 ester of the prt>tecting group. Examples of activated oxy- 
caibonyl esters of NVOC and MeNVOC have the general 
formula: 



# 



5, 

23 



O NOi 




OMs 
MoHVOC-X 



where X is btlogea, mixed aobydride. phenoxy, 
p-mtrcphenoxy, N-fay<fat3xysucdniimde. and the like. 

A pEOCected amiao add or oxideoCide having a phocoac- 
tivatable procecdzig group, such as KV or NP or their 
cocrespondiAg methyl derivatives, McNV or McNP, 
reapeccivdy, oo the ctiboxy ifTTrnmis of the amino add or 
^-hydroxy terminus of the nodeoddc, is fonsed by acylat- 
i&g the carboxy tenniaus or with an activattd boizyl 
dexivative of the protecting group. Examples of activated 
benzyl derivatives of MeNV and MeNP have the general 
formula: 




where X is halogen. hydroxyU tosyU mesyl, tzifloocmcthyl, 
diazOt azido, and the Ukc. 

Another method for generatisg procected monomers is to 
leact the benzyhc alcohol derivative of the proceoing group 
with an activtfed ester of the monocDer. For exanqvle, to 
protect the carboxyl tenninus of an amino add, an activated 
ester of the amino add is reacted with the alcohol derivative 
of the procectiag groi^. such as 6-oxtrovcratror (NVOH). 
Exanq)les of activated esters suitable for such oses indude 
halo-foonatB, mixed anhydride, imidazoyl formaier acyl 
halide, and also indudes foimatioa of the activated ester in 
t^tu the nse of rrmnmrmt rca g cuts such as DCC and the like. 
See Athotoo et aL for other examples of activated esters. 

A farther method for g m f raring protected monom c n is to 
the benzylic alcohol derivative of the protecting group 
with an activated carbon of the monomec For example, to 
protect the 5'-hydroKyl group of a nuddc add, a derivative 
having a 5'-actxvated carbon is reacted with the alcohol 
derivative of the protecting group, such as methyl-6- 
nitrop ip croaot (MePyROH). Erimples of nndeocides hav- 
ing activating groups attached to the 5'-^ydroxyl group have 
the general formula: 



k305 
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5 




OP 



where Y is a halogen atom, a tosyL mesyl, trifluoromethyL 
10 azido. or ^**7o group, and the infr_ 

. Another class of prefared photochemical protecting 
groups has the formula: 



15 



20 




^ where R\ R^, and R^ independently are a hydrogen acorn, a 
lower aUcyl, atyU benzyU halogen. hydroxyL aUooxyl. thioU 
thioether, amino, nitro, carboxyl, formate, formamido, 
snlfanaies, sulfido or pbosphido gro«p, R'* and R' indepen- 
dently are a hydrogen atom, an alkoxy. alkyU halo. aiyL 

30 hydrogen, or alicenyl group, and b=0 or 1. 

A preferred protecting group, 
l-pyrcnylmelh)ioxycarbonyl (PyROC), which is used to 
. protect the amino tenninus of an amino add, for example, is 
fom^ when R' through R^. arc each a hydrogen atom and 

33 B=l: 



40 



45 




O 



Another preferred protecting group, 1-pyrenyhnetbyl 
(?yR), which is used for protecting the carboxy terminus of 
30 ^ iniino add or the hydroocyl group of a nucleotide, for 
exan^e, is formed when R^ through R^ are each a hydrogen 
atom and 0=0: . 



55 



60 




An amino add having a pyrcnytmethyloxycarboayl pro- 
65 tecting grot^ on its amino t^rn*'""* is focmed by acylation 
of the free amine of amino add with an activated oxycar- 
bonyl ester of the pyrenyl protecting group. Pxampl es of 



# 
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activared oxycaiboiiyl csias of PyROC have the general 
formula: 




10 



where X is halogen, or mixed anhydxide, p-mtropheooxy. or 15 
N-hydroxysucdminidc group, and the like 

A protected amino acid or nucleotide having a photoac- 
tivataUc protecting group, snch as PyR, on the carboxy 
tiTTTrinujt of the amino add or 5*-hydraxy tetminus of the 
nucleic tn'^. respectively, is formed by acylating the car- jo 
boxy tiTTninm OT 5'-0H with an activated pyrenjtocthyl 
dexivative of the protecting group. EJiamples of activated 
pyrenylmethyl derivatives of PyR have the general foanola: 




where X is a halogen atom, a hydraxyU diazo« or azido 
group, and the like. 

Another method of generating protected monomen is to 
rcaa the pyrenyhnethyl alcohol moiety of the protecting 
group with an activated ester of the monomer. For examine, 
an activated ester of an amino acid can be reacted with the 
alcohol derivative of the protecting group, such as pyrenyl- 
methyl alcohol (PyROH), to f om the protected derivative of 
the carboxy terminus d the amino add. Examples of acti- 
vated esters include halo-f annate, mixed azihydiide. imida- 
. zoyl formate, acyl halide, and also includes formation of the 
activated ester in situ and the use of common reagents such 
as DCC and the Hke. 

Qeariy. many photosensitive protecting groups are suit- 
able for use in the present invention. 

In preferred embodiments, the substrate is ixradiated to 
remove the pbotoremovaWe protecting groups and oeaie 
regions having free reactive moieties and side products 
resulting from the protecting group. The removal rate of the 
jrotecting groups depends on the wavelength and intensity 
cf the inci^^w radiatioo. as well as the physical and chemi- 
cal properties of the protecting group itself. Pi e f cued pro- 
tecting groiq>s are removed at a faster rate and with a lower 
intensity of radiation. For nr ample, at a given set of 
conditions, McNVOC and McNPOC arc photolytically 
removed firom the N-terminus of a peptide chain faster than 
their unsubstituted parent CQCxqx>unds, NVOC and NPOC 
respectively. 

Removal of the protecting groq> is accomplished by 
irradiation to liberate the reactive group and degradation 
products derived from the protecting group. Not wishing to 
be bound by theory, it is believed that irradiation of as 
NVCX:- and MeNVOC-proteaed oligomers occurs by the 
following reaction schemes: 
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NVOC-AA->3.4-dimethoxy-6-nitrosoben2aldehyde+ 
CO3+AA 

McNV0C-AA->3 .4-dimcdioxy-6-nitro5oacctophenone+ 
COj+AA 

wtiere AA represents the N-tezminus of the amino acid 
oligomer 

Along with the unprotected amino add. other products are 
liberated into scdution: carbon dioxide and a 23-dimethoxy- 
6-nitrosophenylcarbonyl conqKund, which can react with 
nudeophilic potions of the oligomer to fcxm unwanted 
secondary reactions. In tiic case of an NVOC-protected 
amino acid, the degradation product is a 
niirosobenzaldehyde. while the degradation product for the 
other is a nitrosophenyl ketone. For instance, it is believed 
that the product aldehyde from NVOC degradation reacts 
with free amines to form a Scfaiff base (imine) that afifeds the 
remaining polymer synthesis, fte fet red photorcmovable 
protecting groups react slowly or revcrsibly with the oligo- 
mer on the suppoiL 

Again not wishing to be bound by theory, it is believed 
that the product ketotK from irradiation of a MeNVOC- 
protected oligomer reacts at a slower rate with nuclcophUcs 
on the oligomer than the product aldehyde from irradiation 
of the san^ NVOC-protected oligotDer. Although not nnazn- 
biguousty determined, it is believed that this difference in 
reaction rate is due to the diffoence in general reactivity 
between aldehyde ax>d ketooes towards nucleophiies due to. 
stedc aztd electrooic effects. 

Tbe photoremovaUe protecting groups of the present 
invention arc readily removed For rx ample, the photolysis 
of N-protected L-pbenylalanine in sokition and having dtf- 
f ercot photoremovaUe protecting groups was analyzed, and 
the results are presented in tbe following table: 

TABLE 

tioja 
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Sotre 



NBOC NVOC UeNVOC UeHPOC 



Dkasae 128S 
5 niM HjSO^/Dkzxae 1575 



110 
98 
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The half life, tl/2, is the time in seconds required to 

45 remove 50% of Ae starting amount of protecting group. 
NBOC is the 6-mlrobenzyloxycBtx»yl gro^p, NVOC is the 
6-mtrovera07loxycarbonyl group, MeNVOC is the methyl- 
6-nitroveratryloxycaxbonyl group, and MeNPOC is the 
mcthyK6-n icr optpcro n yloxyc«rfaoQyl group. Tbe photolysis 

50 was carried out in the indiratrd solvent with 362^64 
nm-wavelength trradiation having an intensity of 10 
mW/cm', and die concentration of each protected phenyla- 
lanine was 0.10 mM. 
The table shows that deprotection of NVOC-, MeNVOC-, 

55 and MeNPOC-protected phenylalanine proceeded faster 
than the dcprotection of NBOC Furthennore, it shows that 
the deprotection of the two derivatives that are substituted 
on tbe benzyhc carbon, MeNVOC and MeNPOC wae 
photdyzcd at the highest rates in both dioxane and acidified 

60 dioxane. 

1. Use of Photorcmovable Groups During SoUd-Fhase 
Synthesis of Peptides 

The fcxmation of irptidr^ on a solid-phase support 
requires the stepwise attaduDcst of an amino add to a 
65 substrate-bound growing chain. In order to prevent 
unwanted polymezization cf the monomeric amino add 
under the reaction conditions, protection of the amino ter- 
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zmnos of the amino add is miuired. After the moaomcr is 
coupled to the end o( the pqxide. the N-tcmmul procectiag 
group is removed, axkd another amino add is cooplcd to the 
chain. This qrde of coupling and dqrotecong is contioucd 
for eadt amino add in the pq>tide scqueocc. See Memfidd, ^ 
y. Am. Chem. Soc, (1963) Sj:2149. and Athcrton ct aL. 
"Solid Phase Peptide Synthesis- 1989. IRL Ptas. London, 
both incczporatod bexnn by reference for all puiposes. As 
described above, the use erf a pbotoranovable protecting ^cxc B is the base aitadied to the sugar ring; R is a 
^oup allows removal of selected portions of the substrate ^° hy<*rogcn atom when the sugar is dcoxyiibose or R is a 
surface, via patterned irradiation, during the dcprotection ^V^^] group when the sugar is ribose; P reprcseois an 
cyde of the soUd phase synthesis. Tbissdecti^aUows *<»vatcd pbosphOTous pup; and X is a photor^ 
\^ 1 ^1 f \*r .wT • A protcamg group. The photoremovabk protecting group. X. 

spatial conrrt)! of the synthesis-the next ammo aad is ^ preferably NV, NP, PyR. MeNV, Md^^Tandtte as 
coupled only to (he uradiAtcd areas. described above. The activated phosphorous group. P. is 

In one embodiment, (he photoremovable protecting . preferably a reacCive derivative halving a high coupling 
groups erf the present invention are attached to an activated effidcncy, such as a phosphate-triester, pfaosphoramidite or 
ester <rf an amino add at the amino terminus: the Kkr. Other activated pbospbarous derivatives, as well as 

reaction conditions, are well known (See Gait). 
Y NH-x ao ^ Amino Add N-C«bcKy Anhydrides Protected With a 

>< Photoremovable Gtoap 

% Duong Memfield peptide synthesis, an activated csta of 

one amino add is coupled with the free amino tgyrwinnt of 
a stibstrate-bound otigomet. Activated esters of amino adds 
where R is the side chain of a natural or unnatural amino suitable fcr the solid phase synthesis indude halo-fcrautc, 
add, X is a photoremovable protecting group, and Y is an ^ mixed anhydride, imidazoyl fonnate, acyl HjIM^* tad also 
activated carfooxylic add derivative. The photoremovahle indodes foonatioa of the activated rffg in sioi aad the use 
protecting group, X is preferably NVOC NPOC Pj^OC of common reagents such as DCC and the like (See Athcrton 
MeNVOC MeNFOC, and the like as discussed above. The et aL). A pi e f c u e d protected anact activated amino add has 
activated ester, Y. is preferably a reactive derivative having ^ the general fonnula: 
a high coupling cffidency, such as an acyl halide, mixed 
anhydride, N-hydroxysucdnimide ester, perfluorophenyl 
ester, or urcthanc pnHected add. and (he like. Otha acti- 
vated esters and reaction conditioas are well . known (See 
Athcrton ct al.). 35 xo N , 

2. Use of Photoremovable Groups During Solid-Phase 
Synthesis of Oligonudeotides ^ o 

The formation of oligonudeotides on a solid-phase sup- 
port requires the stepwise attadunest of a nudeotide to a where R is the side chain of the amino acid and X ia a 
substrate-bound growing oligomer. In order to prevent ^ photoremovable protecting group. This compound is a 
unwanted polymerization of (he iiK)nomaic nudeotide urethane-protected amino add bandng a photorexDovable 
under the reaction conditions, protection of the y-bydroxyl protecting gro^p attadi to th e amine. A more preferred 
poup of the nudeotide is required After the monomff is activated amino add is formed when the photoretnovablc 
coupled to die end of the oUgomer, the y-hydroxyi protect. protecting group has the general fonnula: 
ing group is removed, and another nucleotide is coupled to 
the chain. This cyde of coupling and deprcTtecting is con- 
tinued for each Budcotide in the oligomer sequence. See 
Gait, •^Oligonucleotide Synthesis; A ftactical Approadi" TlJ 
19S4. IRL ftess, Londoo. inoorporated herein by reference so ^Xj^^. 
for all purposes. As described above, the use <rf a photore- 
movable protecting ffoup allows ronoval, via patterned 
icradtJtion, of selected portions of (he substrate surface ^ 

dazing the dqxotection cyde d the solid phase synthesis. where R^» R^, R', and R'* indepeadenUy are a hydrogen 
This sdectivdy allows spatial control of the symhesis-tbe 33 atom, a lower aUcyU aryl. benzyl, halogen, hydroxyU 
iMXt nucleotide is coqxled only to the iicadiated areas. alknxyL thid, tfaiotfber, aminOf nitro, cazboxyl, fdnnace, 

OUgoBadeodde syDtbesis gcocnUy tovolves coupling >a f?™^, R^t?^ R^f^T'^u^MllS^v^^^^ 
■clivued pbosphoraiis derivsiive on the 3'-hy<h«iyl fsoap ^ * • » . R « subswoled oxygea mnp9 

of . nodwdde with the S'-hydnxcyl group «rf «, iigom^ „ ?^ '^"^ ^J^J^ k*T* * * 

bound to . «aid «pp«t.1V.o nujcr <Sd mahodl^ " 2Sf " 

topofonnthiscaqtlii^ttepbo^^ A prrfai^l laivced «m»o add i, fanned when the 

Fho(»mdite methods (Se«G*it).Ftoteang groups of the photoremovable protecting group is 
present invention are suitable for use in either nutbod. 6-nitroveiatryiootyart>onyL Tlut is. R' and ate each a 

la a prcf ened anbodiment. a phototentovable protecting 6S hydrogen atom. and R' are each a methoxy group, and 
group is attached to an activated nucleotide on the R'is a bydrogea atom. Another p r ef ece d activtfed amiao 
y-hydroxyl group: add is formed when the photoremovable group is 
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6- ni uwi p g o oyi: and are each a hydrogen atom. R^ of sJdIl in the art such as a modd do. 2025 made by Speoia 
and R^ together form a methylene acetaL and R^ is a Physics. light from the source is directed at a lens 1M4 
hydrogen atom. Other protecting groups are possible. which is preferably a cyHikdrical lens of the type weU known 
Another prefezzed activated ester is fanned when the pho- to those of skill in the art. The resulting outpat from the lens 
torcmovable ffoap is methyl-^nitrovcratryl or methyl-^ 5 lOM is a linear beam rather than a spot of light, resulting in 
nitropiperonyl. the capability to detect data substantially simnltaneousty 

Another fa ef erre d activated amino add is formed when along a linear array of pixeb rather than on a pixcl-by-pixel 
the pbotorcn^vabk protecting group has the general for- basis. It will be undemood that a cylindrical lens is ased 
mill a* herein as an iUustratioa of one technique for generating a 

linear beam of li^t on a s^nfacy, but that other techniques 
could also be utilized. 

The beam from the cylindrical lens is passed through a 
dichroic mirror or pcism and directed at (he surface 

of the suitably p iqj a ied substrate 1M8. Substrate 1H8 is 
placed on an x*y translation stage 1#09 such as a xiKxlel no. 
PMSOM made by NcwporL at certain locations on the 
substrate will be f uoresced and transmitted along the path 
indicated by dadied hues back through the dichroic mirror, 
and focused with a suitable lens 1014 such as an &1.4 
camert lens 00 a linear detector 1112 via a variable f stop 
20 focusing lens 1#14. Through use of a linear light beam, it 
, . . becomes possible to generate dau over a line of pixels (such 

where R\ R', and R' indepcndcnfly arc a hydrogen atom, a ^s about 1 cm) along the substrate, rather than from indi- 
lowcT alkyL aiyL benzyl, hal^co hydroxyl, altoxyl. thk>l, ^idual points on the substrate. In atenative cmbafimcals, 
tiuoethcr. ammo, nitro^^oryl f orma^ f crmam^ light U dirtcted at a 2^1imcnsional area of the substrate and 
suifanates, sulfido ^ phosphido group and R and R ^ fl^ortsced light detected by a 2^nsional CX3) aciay. 
indep^Uy are a hydrogen atom, an atoxy. alkyU hjdo, linear dctS is j^JL bec«ise substantially higbi 
aryl^ogen.orafc^ powa densities are Obtained. ^ 

a ur^e-protect^ ammo «^ Jiving a pyrtnyimethy- "^^^^ ^ toc^ from 

loxy^nylpr^eccmggmcpan^ the substrate as a fanctioa of positioo^Jx^ to oae 

^""^Z^!?^ ^ ^^^^ ^^^^ embodiment the detector is a lii^CCD array of t^ 

cadb a hycfrogen atom. ^ commonly known to those of skffl in the art The x-y 

Ki"^<!?" «f^^»cids havmg a pbotorr- j^^^^ ^ ^ ^ 

movaUe jro^g groi^ *^P^«^ mvennon are pre- ^ operably^nnccted to a computer 1016 such as an IBM 
partdbyconde^nofanNijrot*^ PC-7x or equivalent for contSTf the device and data 

acylatmg agent suc^as an ^Ih^ anhydnde <Wor<> 33 ton the CCD array. 

formate and the (Sec FUDer ct aL. U.S. PnL No. » ^ ^ . . ^ . . ^ » 

4,946542 a»d FUUcr i d.. J. Amer. oJi &c. (1990) t^^'^Z^"'^^ 

112:7414-7416. both hada incon«.ted by irf^nec for ^^^^^"^^Ji^ ^h," ff™^"^ 
,, . ' and imensny data are eatnexed with the computer vu the 

aUpurposes). ^^^^^ 

Uretime-protected^^ ^ FIG. U ilhistrates the ardiiiecmre <rf the data collection 

procecong groups arc generally useful as reagents dunng ^w*n ^ ♦k. . 

soUd-iA^>eptidc symhesis, and because of die spatially 'y*™ " ^^^E^^ ^ system ocogs 

iTJ^!r *^Tr ^IrX v^l^r wi l^Iir ^ dfrectKm of the photon counting program U92 

selectivity possible with the photorcmovable protectmg ;«^i,vi^ a.«^Z«^^ 

group, are Specially useful for the spatiaUy addressable as Appendix B, T^e user ii^ 

*7 r*T^lJ: *w uc =p**™7 MAMm^MMjKi ^ dimensions, the mnnber of pitMc ct data txims in 

aethane group fi« »v« w^jate toe c«boxy tamiim gPIB bos U»« the (in u IBMKcZ^ 

for reacoon wBh the amue bouod to the surface and, once --.—j.CjlSLt.^. -rf-. . ^.wi... 

the pei^ide bond isfocn^l toephoto^^^^^ S2^'«.l32:S^'S:5^:?r?y^ 

pojp protect, toe newly forced «nino ™, from contn,na llwUas .PMSOaXhe S^ ttaS 

S^S^l'^a^TJ^^'^Ke^ « frointoelIuoresdag«to«r.tee«m.S«ooou«erm#. 

S't^^'S^si^^'^rSlnS^^^ ^vidit««jj«tatoe,cal«^ 

^WA^^ rJ^«^mK«« -T^^Jlfi^fitiv indicativc of the number of cooits m a given rcgioa 

S^SdfiS^JSS^W^ ^ * «a, the stage controlkr is acti- 

yieios typicauy mgDcc ytisoX with cotmnands for acceleration and velocity, which in 

IV. Data CoUectioo 55 tun drives the scan stage 1112 such as a FM500-A to 

A Dau Collection System another regioiL 

Substrates pccpoicd in accordance with Che above descxip- Data are collected in an image data file 1114 and pro- 

tion are used in one embodiment to determine which of the cessed in a scaling ^augt am 1116, also ind ud cd In Appendix 

plurality of sequences thereon bind to a receptor of interest B. A scaled image is output for dis|^y on, for cxan^le, a 

FIG. U illustrates one «^>wtiTW'»«f of a device used to 60 VGA display 1118. Ihe image is scaled based on an iiqnt of 

detect regions of a substrate whidi contain flourescent the percentage of pixels to dip and the miniftntm and 

roarkers. This device would be used, for example, to detect maxfmmn pixel levels to be viewed. The system outputs for 

the presence or absence of a labeled receptor such as an tise the min and max pixel levels in the raw data, 

antibody which has bound to a synthesized polymer on a B. Data Analysis 

substrate. 6S The oocput from the data coflectioo system is an amy of 

light is directed at the substrate from a light source 1002 data indicative of fluorescent intensity versus location on the 

such as a laser light source of the type well known to those substrate. The data are typically taken over regions substan- 
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tilliy xnuUa duo the area in which synthesis of a given At step 1312 the systan (hen integrates the data within the 

polymer has taken place. Merely by way of example, if baodwi(bh for each of the selected oeSh, sorts the data at step 

polymers were synthesized in s<)uarcs oo the substrate 1314 txsing the synthesis procedure file, and displays the data 

having dixt^nsioos of 500 miaoos by 500 mim>as. the data ^ user on. for example, a video display or a printer, 

may be taken over regions having dimensions of 5 microns s 

by 5 microns. In most prcfcticd embodiments, the regions ^' ^^P'^^^^^ Applications 

over which flourcsccnce data arc taken across the substrate A. OUgonuclcctide Synthesis 

are Vess than about the area of the regions .in which > The . generality .of Ug^t directed spatially addressable 

individual polymen are synthesized, preferably less than Vio " parallel chemical synthesis is dctnonstrated by application to 

the area in which a single polymer is symbesized. and most lo nucleic add synthesis, 

preferably less than Vioo the area in whidi a single polymer 1- Rx a mp le 

is synthesized. Hence, within any area in which a grven light activated formation of a thymidinecytidine dimer 

polymer has been synthesized, a Ittgc numba of fluoies- carried ouL A three dimensional representation of a 

ccncc data poinU are colkcted. fluorescence scan showing a checkerboard pattern generated 

A plot of number of pixels venui intensity for a scan of w *>y «fac light^dirccted synthesis of a dinudeotidc is shown in 

a ceU when it has been exposed to, for example, a Ubelcd ' 8. y-nitrovcratiyl thymidine was attached to a synthe- 

antibody wiU typicaUy take the focm of a bell curve, but si^ «b$tni£c through the 3^ hydroxy 1 gi^^ 

spurious data are observed, particularly at higher intensities. - procecting^groq^ were reiyvcd by^ilhmimation through a 

Sina it is desiiablc to use an average d fluorescent intensity 500 mj^cdcaboaid inwk. The sibstrate was then treated 

ovcragivensymhesisregionindcteiminingrelativcbiDding ^ wi)aipho^oraimdxteamvatcd2-d^ 

afllniiy^thescspuriousdaiawiUtendtoaDdcsirablyskcwthc J^Tl^fJ^ir^ 

been modxfled with an FMOC protected aminohexyl hnV^r 

Accordingly, in one embodiment of the invention the data ^ ^ f^^f ^ ™if « (y.<>ditiiclhaxy^ 

are coorSod for removal of these spurious data points, aifcd (6-N-fluorenylmcthylcarbuao^ 
an average of the dau points is theroifiertitil^ ^ ^""^^^ ^ r^movaljrf the ^CC^protcc^ 

minina reUtive hmding^deacy ^ ^* ^ contained the dmocle- 

TO 13 iltastrites^onc embijiment of a system otide w« fln«socntty la^ 

removal of spurious dau from a set of fluorescence data such stt«c^wxth 1 inM FITC m DMF for one hour. , 
as data used in affinity sotcning studies. A tiser or the ihrcc-dimeasiooal representation of the fluorescent 

system imots data rdating to the chip location and ceU - inttnsity data in FIG. 14 dearty rqxoduces the checkrr- 
coraers at step 13«2. ftom this inf<xmation and the image » ^>^^ Olmninatioo pattern used during photolysis of the 
file, the system cieMes a comwter reprcsc«ation of a substrate. This result demonstrates that oUgonncleotides as 
hiAogramatstcpl3M. thehistopam(atleastinthcfocmof « pej^^es can be synthesized by the Ught-directed 

a mmpTTtfT file) plotting number , of data pixels versus Tn rthoA 
intensity. 

For each cell amain dau analysis loop is then performed, as • ^ Coodusion 

For each cell, at step 13#6. the system calculates the total ^ . . ^ . . . ^ 

intensity or number of pixds for the bandwidth centered . ^ mven&ow herein provuie a new.appcoadi for die 
around varying intensity Irvds, For example, as shown in i?^*"^ ^^^^f" ^. * of ooinpounds^ 

the plot to die tight of Step UH, thelys^ calculates the ^ ^1"^'' ^ '^'^^ 

nui£ber of pUdT vitfain me band of ^ w Hjc system ^ ^ ^ be coupled m a solid^e focmat. 

then-lnoveFtMjbandwidthtoahighercemcrintensity.aDd ^ »n<l when light can be used to generate a reactive group, 
again calrulatrs the numt>er of pixels in the bandwidth. This . The above description is illustrative and not restrictive, 
process is repeated until the entire raoge cf intensities has Many variations of the invention will become apparent to 
been vannrd, and at step 13#8 the system determines which those of skill in the art upoa review of this .disdosure. 
band has the highest total numbs of pixels. The dau within - Mgdy by way of namplr, whfle the invention is iUnstratcd 
this bandwidth are used for ft uthei analysis. Assuming the pcxmarily with r^ard to peptide and nodeotide synthesis, 
bandwidth is selected to be reasonably small, this procedure the invention is not so limitrrt The scope of the iavesdon 
will have the effect of elimiaidng spurious data located at should, therefore, be determined not with reference to the 
the higher intensity levels. The system then repeats at step above descripciocL. but instead should be determined with 
131t if all cells have been evaluated, or repeats for the next reference to the appended daims along with their full scope 
cell. of equivalents. 
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ARRAYS OF NUCLEIC ACID PROBES ON previously characterized scqucacc or reference sequence. 

BIOLOGICAL CHIPS The methods of the invention can be used to detect varia- 
tions between a target and reference sequence, including 

CROSS-REFERENCE TO RELATED single or multiple base substitutions, and deletions and 

.APPLICATION . 5 insertions of bases, as well as. detecting the -presence, 

-ru- ' .• \- f ■ .". . location, and sequence of other more complex variations 

?n V ^A^?ao°,"'-^°° of apphcation Ser. No. 08/143,312. ; .between a target and reference sequence in I nucleic acid, 

filed Oct 26, 1993, now abandoned, which is a continuation ^sent invention provides arrays of oUgonucIeotide 

? probes immobilized on a solid sup^on. ^he aTays a^^^ 

Jun. 1993, now abandoned, mcorporated hercm by refer- preferably synthesized directly on the support using 

VLSIPS™ technology, but other synthesis methods and 

Research leading to the invention was funded in part by immobilization of pre-synthesized ob'gonucleotidc probes 

NIH grant No. 1RO1HG0O813-O1 and DOE grant No. can be used to make the oligonucleotide probe arrays, called 

DE-FG03-92-ER81275, and the government may have cer- "DNA chips", of the invention. In general, these arrays 

lain rights to the invention. comprise a set of oligonucleotide probes such that, for each 

" base in a spcdfic reference sequence, the set includes a 

BACKGROUND OF THE INVENTION probe (called the "wild-type" or "WT' probe) that is exactly 

1 r:* ij r .u r • complementary to a section of the reference sequence 

1. Field of the Invention including the base of interest and four additional probes 
The present invention provides arrays of oligonucleotide (called "substitution probes"), which are identical to the WT 

probes immobilized in microfabricated patterns on silica 20 probe except that the base of interest has been replaced by 

chips for analyzing molecular interactions of biological one of a predetermined set (typically 4) of nucleotides. In the 

interest. The invention therefore relates to diverse htlds preferred embodiment, one of the four substitution probes iy 

impacted by the nature of molecular interaction, including identical to the wild type probe; the other three are comple- 

chemistry, biology, medicine, and medical diagnostics. mcntary to targets that have a single-base substituUon at this 

2. Description of Related Art position. ... 

rti- «t„.-j u u I i_ J J In another aspect, the mvention relates to the arraneemcot 

OLgonucleotide probes have long been used to detect individual pfobes in the array. In one embodiment, the 

complementary nucleic aad sequences m a nucleic acid of probes arc arranged on the chip so that probes for a given 

interest (the target nucleic acid). In some assay formats, position in the sequence are adjacent, and probes for adja- 

the oligonucleotide probe is tethered, i.e., by covalcot ^ cent positions in the reference sequence arc also adjacent. to 

attachment, to a solid support, and arrays of oL'gonuclcotide one another on the chip. One method arranges the probes for 

probes immobilized on solid supports, have been used to : a single base in a short column (alternately row) and 

detect specific nucleic acid sequences in a target nucleic aaangcs the columns in the order of the base position to 

acid. Sec, e.g., PCT patent publication Nos. WO 89/10977 form horizontal (alternately vertical) stripes. The wild-type 

and 89/11548. Others have proposed the use of large num- and each of the substitution probes have specified positions 

bcrs of oligonucleotide probes to provide the complete within the column so that all the probes corresponding to an 

nucleic acid sequence of a target nucleic but failed to ^ substitution, for example, are in a single row. The stripes 

provide an enabling method for using arrays of immobilized separated on the chip by a blank row or column, 

probes for thb purpose. Sec US. Pat. Nos. 5,202,231 and Th* I^NA chips of the inveoUon can be made in a wide 

5,002,867 and PCT patent publication No. WO 93/17126. number of variations. For some applications, leaving out the 

The development of VLS1PS^« technology has provided y^d'^J^c row, leaving out unimportant bases, pooU^^ 

each of which is incorporated hereu, by refereow. VS. « ''T r' '^"f mutation position usmg mulLplc 
patent appUcation Ser. No. 082.937. fiSd Jun 25 1993 T 'T; ^7"^^^ l'^^^"^ ^'"^ " 

describe! methods for making arraU of oUgonucIeotide placmg blank -streets" (no probe) between rows. 

probes that can be used to provfde the^omplSquinoe of S'^Z^Z "^ ' ' """^ ' 
a target nucleic acid and to detect the presence of a nucleic , '• .• i -j nvr^ t.- r j 

acid containinga specific nudeolideiquence ,0 • Tf^'P'f °.'»°^«'»?°»^«>P~«desDNAch.ps 

XM' f i_ • . J r. ? V . log muUUons assoaatcd with cystic fibrosis, mcluding 

Microfabncated ar^^^^^ numbers of ol.gonucle- -^^^^^^^ ^xohs 4. 7, 9, 10, 11. 20, and 21 of the CFTR 

olide probes, called ;DNA chips" offer great promise for a i^^.^jj^, ^^^^es DNA chips for detecting 

wide vancty of applications. New methods and reagents are ^^^^^^^^ j„ (^e p53 gene, a gene in which mutations arc 
required to realize this promise, and the present mvention ^ assodatcd with a wide variety of cancers. Other 

tjcips meet that need. 55 D^^Achips of the invention provide probe arrays for detcct- 

SUMMARY OF THE INVENTION ^ecific sequences of mitochondrial DNA, useful for 

identification and forensic purposes. The invention also 
Tbe present mvention provides methods for making high- provides DNA chips for detecting specific sequences of 
density arrays of oUgonucIeotide probes on silica chips and nucleotides or mutations associated with the acquisition of a 
for usmg those probe arrays to detect specific nucleic acid 60 jmg resistant phcnotype in an infectious organism, such as 
sequences contained in a target nucleic acid in a sample. The rifampicin or other drug resistant TB strains and HIV, in 
mvention also provides arrays of oligonuclcou'de probes on which muUtions in an RNA polymerase gene arc known to 
DNA chips, m which the probes have specific sequences and g[y^ to drug resistance, 
locations in the array to faciUtate identification of a specific 

Urget nucleic acid. In another aspect, the invention provides 65 BRIEF DESCRIPnON OF THE DRAWINGS 
methods for detecting whether one or more specific FIG. 1 shows bow the tiling method of the invention 
sequences of a target nucleic acid in a sample varies from a defines a set of DNA probes relative to a target nucleic acid. 
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In the figure, the target is a DNA molecule, the probes are from the geaomic DNA of an individual with wild-type 
single-stranded nucleic acids 16 nucleotides in length, and AF508 sequences; in panel B, the target nucleic acid origi- 
onlyaportiooof the probes defined by the method is shown. natcd from a heterozygous (with respect to the AF50S 
FIG. 2 shows an illustrative tiled array of the invention mutation) individual. 
. with probes for the detection of point mutations. The base at ^ ■ • FIG. .8, in sheets 1 and 2, corresponding to panels A and 
the position of substitution in each of the wild-lypi probes - . B.of FIG. 7, shows graphs of fluorescence intensity versus 
-is shown in the wild-type lane, and the shadinjg shows the tiUng position. The labels' on the horizonlai axis show the 
location of the substitution probe having the wild-type bases in the wild-type sequence correipooding to the posi- 
sequence. The SEQ ID. NOS. corresponding to the two lion of substitution in the respective probes. Plotted arc the 
peptide sequences shown in the top portion of FIG. 2 arc 311 ^0 intensities observed from the features (or synthesis sites) 
and 312, respectively. The SEQ ID. NOS. corresponding to containing wild-type probes, the features containing the 
the five peptide sequences listed at the bottom of FIG. 2 are substitution probes that bound the most target ("called"), and 
313, 314, 315, 313, and 316, respectively. the feature containing the substitution probes that bound the 

FIG. 3, in paneb A, B, and C, shows an image made from second highest intensity of all the substitution 

the region of a DNA chip containing CFTR exon 10 probes; probes ("2nd Highesr). The SEQ ID NOS. corresponding to 
in panel A, the chip was hybridized to a wild-type target; in peptide sequences shown in sheet 2 of FIG. 8 arc 332 

panel C, the chip was hybridized to a mutant AF508 target; ^^^» respectively. 

and in panel B, the chip was hybridized to a mixture of the FIG. 9 shovvs the human mitochondrial genome; "0^/* is 
wild-type and mutant targets. The SEQ ED. NOS. corrc- the H strand origin of replication, and arrows indicate the . 
spending to the four peptide sequences shown in FIG. 3 are cloned unshaded sequence. 

317-320, respectively. FIG. 10 shows the image observed from application of a 

FIG. 4, in sheets 1-3, corresponding to panels A, B, and sample "of mitochondrial DNA derived nucleic acid (frocd 
C of FIG. 3, shows graphs of fluorescence intensity versus the mt4 sample) on a DNA chip. 

liUng position. The labels on the horizontal axis show the ^ FIG. U is similar to FIG. 10 but shows the image 
bases in the wild-type sequence corresponding to the posi- observed from the mt5 sample. 

tion of substitution in the respective probes. Plotted are the piG. 12 shows the predicted difference image between the 
intensities observed from the features (or synthesis. sites) mt4 and mt5 samples on the DNAchip based on mismatches 
contaimng wild-type probes, the features containing the between the two samples and the reference sequence, 
substitution probes that bound the inost target ("called^, and 13 shows the actual difference image observed for 

■ the feature CO ntammg the substiiuiion probes that bound the . the mt4 and mt5 samples ' 

Urgetwiththesecondhighestintcnsityofallthesubstitution -> . ■. ' ^ , , 

probes ("2nd Highest"). TTie SEQ ID. NOS. coaesponding . P^^"^' ^ '^"^ \h 7?.' normalized 

to the t^vo peptide sequences shown in sheet 1 of HG. 4 arc across rows 10 and U of the array and a tabula- 

321 and 318, respectively; the SEQ ID. NOS. corresponding „ ' 

to the two peptide sequences shown in sheet 2 of HG. 4 are discrunmation between wild-type and 

322 and 318, respectively; and the SEQ ID. NOS. corre- hybrids obtained with the chip. A median of the six 
sponding to the two peptide sequences shown in sheet 3 of normalized hybridization scores for each probe was taken; 
FIG. 4 arc 323 and 318 respectively ^^P^ P^°^ median score to the normal- 

FIG, 5, in panels A. b! and C, shows an image made from ^ ^^^^^^^'^^^^ f^^' versus mean counts. A ratio of 1.6 
a region of a DNAchip containing CFTR exon 10 probes; ^ ^^^^ °° P^^^^^"' 

in panel A. the chip was hybridized to the wt480 target; in the identity of the base mismatch 

panel C, the chip was hybridized to the mu480 target; and in »°fliiCDce the ability to discriminate mutant and wild- 

panel B, the chip was hybridized to a mixture of the ^yP« sequences more than the position of the mismatch 
wild-type and mutant targets. Tbc SEQ ID. NOS. corre- *° oligonucleotide probe. The mismatch position is 

sponding to the peptide sequences shown in FIG. 5 are expressed as % of probe length from the 3'-cnd. The base 
324-327, respectively. ^ mdicated on the graph. 

no. 6, in sheets 1-3, corresponding to panels A. B, and Provides a 5' to 3' sequence listing of one target 

C of FIG. 5, shows graphs of fluorescence intensity versus corresponding to the probes on the chip. X is a control probe, 
tiling position. The labels 00 the horizontal axis show the 50 Positions that differ in .the target (i.e., are mismatched with 
bases in the wild-type sequence corresponding to the posi- P^^*'^ designated site) arc in bold. The SEQ ID. 

Uon of substitution in the respective probes. Plotted arc the corresponding to the peptide sequence shown in FIG. 

intensities observed from the features (or synthesis sites) ^ 

containing wild-type probes, the features containing the FIG. 18 shows the fluorescence image produced by scan- 
substitution probes that bound the most target ("called"), and 55 ning the chip described in FIG. 17 when hybridized to a 
the feature containing the substitution probes that bound the sample. 

target with the second highest intensity of all the substitution FIG. 19 illustrates the detection of 4 transitions in the 
probes ("2nd Highest"). The SEQ ID. NOS. corresponding target sequence relative to the wild-type probes on the chip 
to the two peptide sequences shown in sheet 1 of FIG. 6 are in FIG. 18. 

328 and 329, respectively; the SEQ ID. NOS. corresponding eo FIG. 20 shows the alignment of some of the probes on a 
to the two peptide sequences shown in sheet 2 of FIG. 6 are p" DNAchip with a 12-mcr model target nucleic acid. The 
330 and 329, respectively; and the SEQ ID. NOS. corre- SEQ ID. NOS. corresponding to the fourteen peptide 
sponding to the two peptide sequences shown in sheet 3 of sequences shown in FIG. 20 are 334—347, respectively. 
FIG, 6 arc 331 and 329, respectively. FIG. 21 shows a set of 10-mcr probes for a p53 exon 6 

FIG. 7, in panels A and B, shows an image made from a 65 DNAchip. The SEQ ID. NOS. corresponding to the thirteen 
region of a DNA chip contaim'og CFTR exon 10 probes; in peptide sequences shown in FIG. 21 are 334 and 348-359, 
panel A, the chip was hybridized to nucleic acid derived respectively. 
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FIG. 22 shows that very distinct patterns are observed in the nucleotide sequecice of a target nucleic acid with 

after hybridization of p53 DNA chips with targets having oligonucleotide probes of defined length. The length (L) of 

different 1 base sabstitutioos: In the. first image in FIG. 22, the probe is typically expressed as the number of nucleotides 

the 12-mcr probes that form perfect matches with the of bases in a single-stranded nucleic acid probe. For pur- 

wild-type target arc in the first row (top). The 12-mer probes 5 poscs of the present invention,- lengths. ranging from 12 to 18 

• with single base mismatches arc located in the second, third, ' bases a re preferred, although shorter and longer lengths can..- 

"and fourlh'rows and have much lower signals.- ' also be employed. To cmplby the tiling method, one syn- • 

FIG. 23. in graphs 2. 3. and 4, graphicaUy depicts the data ^"^^^ ^ of probes defined by the particular nucleotide 
in HG. 22. On each graph, the X ordinate is the position of ""^ll^l^'^^ ^^'^'^ "^'^ ^''^ 

the probe in its row on the chip, and the Y ordinate is the lO m the target DNA segment, one synthesizes a probe comple- 

signal at that probe site after hybridization. "^^^'^^^ f° subsequence of the target nucleic aad begin- 

HG. 24 shows the results of hybridizing mixed target "^"^^ L-l bases to the S'-side (sec 

populations of WT and mutant p53 genes to the p53 DNA ' ^' . ^ . ^. . 

^jj^p » ^ J preferred embodiment of the mvenUon, the probes are 

RG. 25. in graphs 1-4. shows (sec FIG. 23 as weU) the ^^^^^f^^ ^^^^^^ Immobilization typically by covalenl 
hybridization efficiency of a lO-iner probe array as com- attachmen of a prc-synthesized probe or by synthesis of t 
pared to a 12-mer probe array P^^\?° substrate) on the substrate or chips m lanes 

T^r^ L . / r^vr . ^. , .J. J sire tchmg across the chip and separated, and these lanes arc 

no. 26 shows an image of a p53 DNA chip hybndized to ^ ^^^^ ^^^^^^^ blocks of preferably 5 lanes, although 
a target Ui A. blocks of other sizes will have useful appUcalion, as will be 

FIG. 27 illustrates how the actual sequence was read from apparent from the foUowing illustration. The first of these 
the chip shown m HG. 26. Gaps in the sequence of letters . five lanes, caUed the "wild-typc lane", contains probe« 
in the WT rows correspond to control probes or sites. arranged in order of sequence, and aU of the probes are 

PosiUons at which bases arc miscaUed are represented by complementary to a specified wild-typc nucleic acid 

letters m italic type in cells corresponding to probes in which sequence. The other four lanes contain probe sets for dctect- 
theWTbascs have been substituted by other bases. -nieSEQ possible single-base mutations in the defined 

ID. NO. corresponding to the peptide sequence shown in sequence; in turn, these probe sets are defined by a position 

FIG. 27 IS 360. of potential non-complementarity in the probe relative to the 

FIG. 28 illustrates the VLSIPS™ technology as applied to orget (i.e., a single base mismatch) and the identity of the 
the light directed synthesis ofoligonucleolidcs. Light (hv) is ^ nucleotide in the probe, at that posiu'on. (i.e., whether the 

. shone through a mask (Mj) to activate functional groups nucleotide is an A, C, G, or T nucleotide). The position of 

(—OH) on a surface by removal of a protecting group (X)- . . mismatch, also caUcd the position of substitution, is prefer- . 

Nucleoside buQding blocks protected with photorcmovable ably selected to be near the center of the probes, i.e., position 

protecting groups (T-X, G-X) arc coupled to the activated 7 of a probe of L-15. 

areas. By rcpcaucg the irradiation and coupling steps, very 35 ^ach probe in the wild-typc lane, one synthesizes four 

complex arrays of oUgonuclco tides can be prepared. pj^^es (one for each of the lanes other than the wild-typc 

FIG. 29 illustrates how the VLSIPS"^^ process can be used |anc). Three of these four probes is identical to the corre- 

to prepare "nucleoside combinaiorials" or oligonucleotides sponding wild-typc probe but for the base at the position of 

synthesized by coupUng all four nucleosides to form dimcrs, substitution, and the remaining probe is identical to the 

trimcrs, etc. 4q wild-type probe. This set of four substitution probes is 

FIG. 30 shows the deprotection, coupling, and oxidation preferably placed in a column directly below (or above) the 

steps of a solid phase DNA synthesis method. corresponding wild-typc probe, thus creating an A-lanc, a 

FIG. 31 shows an illustrative synthesis route for the C-lanc, a G-lanc, and a T-lane. FIG. 2 shows an illustrative 

nucleoside building blocks used in the VLSIPS™ method. tiled array of the invention with probes for the detection of 

FIG. 32 shows a preferred photorcmovable protecting 43 point mutations. The base at the position of substitution in 

group. McNPOC. and how to prepare the group in active each of the wild-typc probes is shown xn the wild-typc lane, 

form. and the shading shows the location of the substitution probe 

no. 33 illustrates an illustrative detection system for having the wild-typc sequence. Below are the probes that 

scanning a DNA chip. would be placed in the column marked by the arrow if the 

50 probe leneth were 15 and the position of substitution were 

DETAILED DESCRIPTION OF THE \ 7 - ^ — 

INVENTION 3'-CCGACTGCAGTCGTr (SEQ. ID. NO: 1) 

Using the VLSIPS™ method, one can synthesize arrays 3'-CCGACTACAGTCGTT (SEQ. ID. N0:2) 
of many thousands of oLgonucleolide probes on a substrate, 3 -CCGACTCCAGTCGTT (SEQ. ID. N0:3) 
such as a glass slide or chip. The method can be used, for 55 3 -CCGACTGCAGTCGTI (SEQ. ID. N0:1) 
instance, to synthesize "combinatorial" arrays consisting ot, 3'-CCG ACTTCAGTCGTT (SEQ. ID. N0:4) 
for example, all possible octanucleotidcs. Such arrays can be Thus, the substitution lanes occupy four of the five lanes 
used for primary sequencing-by-hybridization on genomic separating successive wild-typc lanes on the chip; the blocks 
DNA fragments or other nucleic acids or to detect mutations of five lanes can be separated by a sixth lane for measure- 
in a target nucleic acid for which the normal or 'Vild-typc'* 60 ment of background signals. 

nucleotide sequence is already known. Using the preferred The DNA chips of the invention have a wide variety of 
method of the invention, one employs a strategy called applications. In one embodiment, the DNA chip is used to 
"tiling" to synthesize specific sets of probes or at spatially- select an optimal probe from an array of probes. In this 
defined locations 00 a substrate, creating the novel probe embodiment, an array of probes of variable length and 
arrays and **DNA chips" of the invention. 65 sequences is synthesized and then hybridized to a target 

To illustrate the tiling method of the invention, consider nucleic acid of known sequence. The pattern of hybridiza- • 
the problem of detecting mutations at one or more position tioo reveals the optimal length and sequence composition of 
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probes to delect a particular mutation or other specific base substiluUoo and any de let ioo within the 192-basc cxon, 

sequence of nucleotides. In some circumstances, i.e., target including the three-base deletion known as AF50S. As 

nucleic acids with repeated sequences or with high G/C described in detail below, hybridization of sub-nanomolar 

content, very long probes may be required for optimal conccnlrations of wild-type and AF508 oligonucleotide lar- 

deteclion. In one embodiment for detecting specific 5 g^^ nucleic acids labeled with, fluorescein to these arrays 

sequences in a target nucleic acid with" a DNA chip, repeat . .. produces, highly specific signals (detected -with ' coofocal 

sequences are detected as follows.' The chip comprises - scanning fluorescence microscopy) that permit discnmina- 

probcs of length sufficient to extend into the repeat region between mutant and wild-type target sequences in both 

varying distances from each end. The sample, prior to bomozygous and heterozygous cases. Tbe method and chips 

hybridization, is treated with a labeled oligonucleotide that lo °^ mvention can also be used to detect other known 

is complementary to a repeat region but shorter than the fuU '^"J^llTocT.n^^^J t?%f ^"^^f ^ "^^^'u 

length of the repeat. The target nucleic is labeled with a "^K^l^rr^ n^! fibrosis muution is known as 

c*.-/^«/4 H.-cf.-^o* UK*! Afr*r 1. u 'J- .u L- • AF50S, because the mutation is a three-base deletion that 

<«nn.H ft iLc th ; K K "fu Ik"'. ^f, f'^ ^""'^^ amino add #508 from the CFTR 

. K r have bound bo^ the labeled target t,^. , i,,,,^;^^ ^^^j^^^ 

and the labeled ohgonucleotide probe; the presence of such is 5,i,cdng AF508, one such chip Lults from applyifig the 
bound probes shows that at least two repeat sequences are tiUog nj^hod to exon 10 of the CFTR gene, the exon to 
present. which AF508 has been mapped. The tiling method involved 

A variety of methods can be used to enhance detection of the synthesis of a set of probes of a selected length in the 
labeled targets bound to a probe on the array. In one range of from 10 to 18 bases and complementary to subse- 
embodimcnt, the protein MutS (from E, colt) or equivalent 20 qucnocs of the known wild-type CFTR sequence starting at 
proteins such as yeast MSHl, MSH2, and MSH3; mouse a position a few bases into the intron on the 5'-side of exon 
Rep-3, and Streptococcus Hex-A, is used in conjunction • 10 and ending a few bases into the inlron on the S'-sidc.-^ 
with target hybridization to delect probe-target complex that There was a probe for each possible subsequence of the 
contain mismatched base pairs. The protein, labeled directly given segment of the gene, and the probes were organized 
or indirectly, can be added lo the chip during or after 25 into a "lane" in such a way that traversing the lane from the 
hybridization of target nucleic acid, and differentially binds upper left-hand comer of the chip to the lower righthand 
lo homo- and hcteroduplex nucleic acid. A wide variety of corner corresponded to traversing the gene segment base- 
dyes and other labels can be used for similar purposes. For by-base from the 5'-end. The lane containing that set of 
instance, the dye YOYO-l is known lo bind preferentially to probes is, as noted above, called the "wild-type lane." 
nucleic acids containing sequences comprising runs of 3 or . 30 Relative to the wild-type lane, a "substitution" lane, called ' 
more G residues. . - . . the "A-lane", was synthesized on the chip. The A-lane 

The DKA chips produced by. the methods of the invention probes were identical in sequence to an adjacent 
can be used to study and detect mutations in exons of human . (immediately below the corresponding) wild-type probe but 
genes of clinical interest, including point mutations and conlaiDcd,regardIessof the sequence of the wild -type probe, 
deletions. In^ the following sections, the method of the js a dA residue at position 7 (counting from the 3'-cnd). In 
invention is illustrated by the detection of mutations in a similar fashion, subsu'tution lanes with replacement bases 
- variety of clinically and medically significant human nucleic dC, dG. and dT were placed onto the chip in a "C-lane,** a 
add sequences. Thus, the invention is illustrated first with "G-lanc," and a "T-lane," respectively. A sixth lane on the 
respect to the preparation of DNA chips for the detection of chip consisted of probes identical to those in the wild-type 
mutations associated with cystic fibrosis, then with DNA 40 lane but for the deletion of the base in posiu'on 7 and 
chips for the detection of human mitochondrial DNA restoration of the original probe length by addition to the 
sequences, then with DNA chips for the detection of mula- 5'-end the base complementary lo the gene at that position. 
tioQS in the human p53 gene associated with cancer, and The four substitution lanes enable one to deduce the 
finally with respect to the detection of mutations in the HIV sequence of a target exon 10 nucleic acid from the relative 
RT gene associated with drug resistance. 45 intensities with which the target hybridizes to the probes in 

Detection of Cystic Fibrosis Mutations with DNA Chips the various lanes. The probe organization on the chip can be 
A number of years ago, cystic fibrosis, the most common conveniently columnar, and the set of probes consisting of a 
severe autosomal recessive disorder in humans, was shown wild-type probe and four corresponding substitution probes 
to be associated with mutations in a gene thereafter named is referred to as a "column set.** One and only one of the four 
th e Cy stic Fibrosis Transmembrane Cotiductancc Regulator 50 substitution probes in a column set has exactly the same 
(CFTR) gene. The sequences of the exons and parts of the sequence as the wild-type probe in the set: Those of skill in 
in Irons in the gene arc known, as are the changes corre- the art will appreciate that, in other embodiments of the 
sponding to several hundred known mutations. Several tests invention, one could delete one or more lanes or columns 
have been developed for detecting the most frequent of these and stQl benefit from the invention. Various versions of such 
mutations. The present invention provides CFTR gene oli- 55 exon 10 DNA chips were made as described above with 
gonucleotide arrays (DNA chips) that can be used to identify probes 15 bases long, as well as chips with probes 10, 14, 
mutations in the CFTR gene rapidly and efficiently. and 18 bases long. For the results described below, the 

The methods used to make the high-density DNA chips of probes were 15 bases long, and the position of substimtion 
the invention allow probes for long stretches of DNA coding was 7 from the 3*-cnd. 

regions to be directly "written" onto the chips in the frjraa of 60 To demonstrate the ability of the chip lo distinguish the 
sets of overlapping oligonucleotides. These methods have AF508 mutaUon from the wild-type, two synthetic laiget 
been used to develop a number of useful CFTR gene chips, nucleic acids were made. The first, a 39-mer complementary 
one illustrative chip bears an array of 1296 probes covering to a subsequence of exon 10 of the CFTR gene having the 
the fiill length of exon 10 of the CFTR gene arranged in a three bases involved in the AF50S mutation near its center, 
36x36 aaay of 356 Xm elements. The probes in the array can 65 is called the •Vild-type" or wt508 target, corresponds to 
have any length, preferably in the range of from 10 to 18 positions 111-149 of the exon, and has the sequence shown 
residues and can be used lo detect and sequence any single- below: 
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9 10 

S'-CATTAAAGAAAAWCATCTTTGGTGTTTCCTAT- whose point of substitution corresponds to the T at the 3'-end 
GATGA (SEQ. ID NO: 5). of the deletion was very close to background. Following that 

The second, a 36-mcr probe derived from the wild-type pattern, the wild-type probe whose point of substitution 
Urget by removing those same three bases, is called the corresponds to the middle base (also a 1) of the deletion 
• . "mutant" target or mu508 target and has the sequence shown 5 bound still less target. However, the probe in the T-Iaoc of 
..■ below, first with dashes to indicate the deleted bases, and . that column set bound the target very well* . • 
• then without dashes but with one base underlined (to indi- ' Examination of the sequences of the two targets reveals 
cate the base detected by the T-lanc probe, as discussed that the deletion places an A at that position when the 
b^^^^w)- . sequences are aligned at their 3'-ends and that the T-janc 

5 * ' C ATT A AAGAAAATATCAT lO probe is complementary to the mutant target with but two 

TGGTG 1 1 1 CCTATGATGA; (SEQ. ID N 0:6) mismatches near an end (shown below in lower-case letters 

S'-CAJTAAAGAAAATArCATTGGTGriTCCrATGATGA. with the position of substitution underlined): 

(SEQ. IDN0:7) Target: *5*-CATTAAAGAAAATATCATTGGTGT- 

Both targets were labeled with fluorescein at the 5'-cnd. TTCCTATGATGA 

In three separate experiments, the wild-type target, the 15 Probe: 3'-TagTAGTAACCACAA (SEQ. ID N0:8) 
mutant target, and an equimolar mixture of both targets was Thus the T-lanc probe in that column set calls the correct 
exposed (0.1 nM wt508, 0.1 dM mu508, and 0.1 dM wtSOS base from the mutant sequence. Note that, in the graph for 
plus 0.1 nM mu508, respectively, in a solution compatible the equimolar mixture of the two targets, that T-lane probe 
with nucleic acid hybridization) to a CF chip. The hybrid- binds almost as much target as does the A-laoe probe in the 
izalion mixture was incubated overnight at room 20 same column set, whereas in the other column sets, the 
temperature, and then the chip was scanned on a reader (a probes that do not have wild-type sequence do not bind 
coofocal fluorescence microscope in photon-counting mode; target at" all as* well. Thus, that one column set, and irr' 
images of the chip were constructed from the photon counts) particular the T-lane probe within that set, detects the AF508 
at several successively higher temperatures while still in mutation under conditions that simulate the homozygous 
contact with the target solution. After each temperature 25 case and also conditions that simulate the heterozygous case, 
change, the chip was allowed to equilibrate for approxi- The present invention thus provides individual probes, 
malely one-half hour before being scanned. After each set of sets of probes, and arrays of probe sets on chips, in specific 
scans, the chip was exposed to denaturing solvent and patterns, as the probes provide important benefits for dctect- 
copditioos to wash, i.e., remove target that had bound, the ing the presence of specific exon 10 sequences. The 
chip so that the next experiment could be done with a clean 30. sequences of several important probes of the invention are 
^^P- .. shown belowi In each case, the letter "X" stands for the point V 

The results of the experiments arc shown in FIGS. 3, 4, 5, of substitution in a given column set, so each of the 
and 6. FIG. 3, in panels A, B, and C, shows an image made sequences actually represents four probes, with A, C, G, and 
from the region of a DNA chip containing CFTO exon 10 T, respectively, taking the place of the "X." Sets of shorter 
probes;^ in panel A, the chip was hybridized to a wild-type 35 probes derived from the sets shown below by removing up 
target; in panel C. the chip was hybridized to a mutant delta to five bases from the S'-end of each probe and sets of longer 
508 target; and in panel B, the chip was hybridized to a probes made from this set by adding up to three bases from 
mixture of the wild-type and mutant targeU. FIG. 4, in sheets the exon 10 sequence to the 5'-cnd of each probe, arc also 
1-3. corresponding to panels A, B, and C of FIG. 3, shows useful and provided by the invention, 
graphs of fluorescence intensity versus tiling position. The 40 3'-TTTArAXTAGAAACC (SEQ. ID N0:9) 
labels on the horizontal axis show the bases in the wild-type 3'-TTATAGXAGAAACCA (SEQ. ID N0:10) 
sequence coaesponding to the posiUon of substitution in the 3'-TATAGTXGAAACCAC (SEQ. ID N0:11) 
respective probes. Plotted are the intensities observed from S'-ATAGTAXAAACCACA (SEQ. ID NO: 12) 
the features (or synthesis sites) containing wild-type probes, 3'-TAGTAGXAACCACAA (SEQ. ID N0;13) 
the features containing the subsu'tution probes that bound the 45 3'-AGTAGAXACCACAAA (SEQ. ID N0:14) 
most target ("called"), and the feature containing the sub- 3'-GTAGAAXCCACAAAG (SEQ. ID N0:1^ 
stitution probes that bound the target with the second highest 3'-TAGAAAXCACAAAGG (SEQ. ID N0:16) 
intensity of aU the substitution probes ("2nd Highest"). 3*-AGAAACXACAAAGGA (SEQ. ID N0:1^ 

These figures show that, for the wild-type target and the Although in this example the sequence could not be 
equimolar mixture of targets, the substitution probe with a 50 reliably deduced near the ends of the target, where jthc re is 
nucleotide sequence identical to the corresponding wild- not enough overlap between target and probe to 'allow 
type probe bound the most target, allowing for an unam- effective hybridization, and around the center of the target, 
biguous assignment of target sequence as shown by letters where hybridization was weak for some other reason, per- 
ncar the points on the curve. The Urgct wt508 thus hybrid- haps high AT-cootcnt, the results show the method and the 
tzed to the probes in the wild-type lane of the chip, although 55 probes of the invention can be used to detect the mutation of 
the strength of the hybridization varied from probe-to-probe, interest The mutant target gave a pattern of hybridization 
probably due to differences in melting temperature. The that was very similar to that of the wtSOS target at the ends, 
sequence of most of the target can thus be read directly from where the two share a common sequence, and very different 
the chip, by inference from the pattern of hybridizauon in in the middle, where the deletion is located. As one scans the 
the lanes of substitution probes (if the target hybridizes most 60 image from right to left, the intensity of hybridization of the 
intensely to the probe in the A-lanc, then one infers that the . target to the probes in the wild-type lane drops off much 
target has a T in the position of substitution, and so on). more rapidly near the center of the image for mu508 than for 
For the mutant target, the sequence could similarly be wtSOS; in addition, there is one probe in the T-lanc that 
called on the 3'-sidc of the deletion. However, the intensity hybridizes intensely with mu508 and hardly at all with 
of binding declined precipitously as the point of substitution 65 wt508. The results from the equimolar mixture of the two 
approached the site of the deletion from the 3'-cnd of the targets, which represents the case one would encounter in 
target, so that the binding intensity on the wild-type probe testing a heterozygous individual for the mutation, arc a 
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blend of the results for the separate targets, showing the 
power of the iaveotion to disltnguish a wild- type Urgct 
sequence from one containing the AF508 mutation and to 
detect a mixture of the two sequences. 

The results above clearly demonstrate how the DNA chips 5 
of the invention can be used to detect a deletion muutioo, 
AF508; another model system was used to show that the 
chips can also be used to detect a point mutation as well. One 
of the more frequent mutations in the CFTR gene is G4S0C, 
which involves the replacement of the G in position 46 of lO 
cxon 10 by a T, resulting in the substitution of a cysteine for 
the glycine noraially in position #4S0 of the CFTR protein. 
The model target sequences included the 21-mer probe 
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tcnis. The wild-type sequence could easily be read from the 
chip, but the probe that bound the mu480 target so well when 
only the mu480 target was present also bound it well when 
both the mutant and wild- type targets were present in a 
mixture, making the hybridization pattern easily "distinguish- 
able from, that of the wild-type target alone! These, results^ 
again show the power of the DNA chips of the invention to 
detect point mutations in both homo- and heterozygous 
individuals. 

To demonstrate cUm'cal apph'cation of the DNA chips of 
the invention, the chips were used to study and detect 
mutations in nucleic acids from genomic samples. Genomic 



wt4S0 to represent the wild-type sequence at positions samples from a individual carrying only the wild-type gene 

37-55 of cxon 10: S'-CCTTCAGAGGGTAAAAITAAG 15 andanindividualheterozygousforAF508 were amplified by 

(SEQ. ID N0:18) and the 21-m er probe mu480 to represent PGR using exon 10 primers containing the promoter for T7 

the muunt sequence; S'-CCTTCAGAGTGTAAAAnTAAG RNA polymerase. lUustrativc primers of the invention arc 

(SEQ. ID N0:19). shown below. 



Exoa Name Sequccce 



10 Cri9-T7 TAATACGACTCACTATAGOGAGatgicrtutaatgaEgggtu (SEQ. LD. NO:20) 

10 CniOc-T7 TAATACGACrCACTATAGGGAGtigtgtsugggticatotgc (SEQ. ID. N0:2l) 

10 CraOc-T3 CKXjGAATIAAOCCrCACrAAAGGiagtgtgsagggttcautg (SEQ. ID. NO:22) 
10,11 CFilO-T7 TAAJACGACrCACTATAGGGAGigcstictiaMgtgactctc (SEQ. ID. NO:23) 

11 CFillc-T7 TAArACGACrCACrXrAGCGAGaatgaargaatuacagcaa (SEQ. ID. NO:24) 
11 CFillc-T3 CGGA-OTAACCCrCACTAAAGCacatgaatgacatUacagcaa (SEQ. ID. KOOS) 



In separate experiments, a DNA chip was hybridized to These primers can be used to amplify cxon 10 or cxon 11 
each of the targets wt480 and mu4S0, respectively, and then .sequences; in another embodiment, multiplex PGR is 
scanned with a confocal microscope. FIG. 5, in panels A. B, employed, using two or more pairs of primers to amplify 
and C, shows an image made from the region of a DNA chip more than one cxon at a time. 

containing CFTR cxon 10 probes; in panel A, the chip was The product of amplificaUon was then used as a template 
hybridized to the wt480 target; in panel C, the chip was 35 for the RNA polymerase, with fluoresceinated UTP present 
hybridized to the mu4S0 target; and in panel B, the chip was to label the RNA product After sufficient RNA was made, 
hybridized to a mixture of the wild-type and mutant targets. it was fragmented and appUcd to an cxon 10 DNA chip for 
FIG. 6, in sheets 1-3, corresponding to panels A, B, and C 15 minutes, after which the chip was washed with hybrid- 
of FIG. 5, shows graphs of fluorescence intensity versus ization buffer and scanned with the fluorescence micro- 
tiling position. The labels on the horizontal axis show the 40 scope. A useful positive control included on many CF cxon 
bases in the wild-type sequence corresponding to the posi- 10 chips is the 8-mer 3'<:GCCGCCG-5\ FIG. 7, in panels 
lion of substitution in the respective probes. Plotted arc the A and B, shows an image made from a region of a DNA chip 
intensities observed from the features (or synthesis sites) containing CFTR cxon 10 probes; in panel A, the chip was 
containing wfld-type probes, the features containing the hybridized to nucleic add derived from the genomic DNA of 
substitution probes that bound the most Urget ("called"), and 45 an individual with wild-type AF508 sequences; in panel B, 
the feature containing the substitution probes that bound the the target nucleic acid originated from a heterozygous (with 
Urgel with the second highest intensity of all the substitution respect to the AF508 mutation) individual. FIG. 8, in sheets 
probes ("2nd Highesr). 1 and 2, corresponding to panels A and B of FIG. 7, shows 

These figures show that the chip could be used to graphs of fluorescence intensity versus tiling position, 
sequence a 16-basc stretch from the center of the target 50 These figures. show that the sequence of. the wild-type, 
wt480 and that discrimination against mismatches is quite RNA can be called for most of the bases near the mutation, 
good throughout the sequenced region. When the DNA chip In the case of the AF508 heterozygous carrier, one pardcular 
was exposed to the target mu480, only one probe in the probe, the same one that distinguished so clearly between 
portion of the chip shown bound the target well: the probe the wild-type and muUnt oligonucleotide targets in the 
in the set of probes devoted to identifying the base at 55 model system described above, in the T-lane binds a large 
position 46 in cxon 10 and that has an A in the position of amount of RNA. while the same probe binds little RNAfrom 
substitution and so is fully complcrnentary to the central the wild-type individual. These results show that the DNA 
portion of the muUnt UrgeL All other probes in that region chips of the invention are capable of detecting the AF508 
of the chip have at least one mismatch with the mutant target mutau'on in a heterozygous carrier, 
and therefore bind much less of it. In spite of that fact, the 60 Thus, the present invention provides methods for synthc- 
scquence of mu480 for several positions to both sides of the sizing large numbers of oUgonucleou'de probes on a glass 
mutation can be read from the chip, albeit with much- substrate and um'que probe sets in a defined array in which 
reduced intensities from those observed with the wild-type the probes are arranged in the array by the "tiling" method 
^^c^- of the invention. The DNA chips produced by the method 

The results also show that, when the two targets were 65 can be used to detect mutations in particular sequences of a 
mixed together and exposed to the chip, the hybridization Urgct nucleic acid, such as genomic DNA or RNA produced 
pattern observed was a combination of the other two pat- from transcription of an amplified genomic DNA. These 
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chips cao be used to detect both point mutations and small 
deletions. Moreover, the pattern of hybridization to the chip 
allows inferences to be drawn about the sequences of the 
mutant DNAs. 
For example, in the model system involving the cystic 
- fibrosis point mutation . G480C, the A-lane probe. whose 
position of substitution corresponds to the posiiioa of the 
mutation does not bind much wild-type target, because in the 
wild-type sequence, a G occupies that position. However, it 



some applications to using a minimal set of oligonucleotides 
specific to the sequence of interest, rather than a set of all 
possible N-mers, Some of these advantages include: (i) each 
position in the array is highly informative, whether or not 
hybridization occurs; (ii) nonspecific hybridization is mini- 
mized; (iii) it .is straightforward -to correlate, hybridization 
differences with sequence differences, particularly with ref- 
erence to the hybridization pattern of a known standard; and 
(iv) the ability to address each probe independently during 



binds mutant target very well, alIo^vi^gone to infer correctly lO XZhZf.Z^^^^^^^ T7yT u u T 

that the mutation involves a change of that G to a T S3mthesis^using high resolution photohthography,aUows the 



change 

Similarly, in the case of the three-base deletion in cystic 
fibrosis known as AF508, the T-lane probe that binds mutant 
Urget so intensely is responding to the fact that the deletion 
has brought a CAT sequence into the position occupied by 15 
a CTT sequence in the wild-type target. The DNA chips of 
the invention can be used to detect and sequence not only 
known mutations in an organism's genome but also new 
mutations not previously characterized. The DNA chips and 



array to be designed and optimized for any sequence. For 
cxainplc the length of any probe can be varied independently 
of the others. 

The present invention illustrates these advantages by 
providing DNA chips and analytical methods for delecting 
specific sequences of human mitochondrial DNA In one 
preferred embodiment, the invention provides a DNA chip 
for analyzing sequences contained in a 13 kb firagment of 



methods of the invention can also be used to detect specific 20 human mitochondrial DNA from the "D-loop" region, the 



sequences in other CFTR exons as well as other human 
genes for purposes of research and clinical genetic analysis, 
as demonstrated below. 

Detection of Specific Human Mitochondrial DNA 
Sequences with DNA Chips 

As noted above, the present invention provides DNA 
chips on which a known DNAscquencc is represented as an 
array of overlapping oligonucleotides on a solid support. 
This set of oligonucleotides is used to probe a target nucleic 



most polymorphic region of human mitochondrial DNA 
One such chip comprises a set of 269 overlapping oligcv' 
nucleotide probes of varying length in the range of 9-*14 
nucleotides with varying overlaps aaanged in -600x600 
25 micron features or s>'nlhesis sites in an array 1 cmxl cm in 
size. The probes on the chip arc shown in columnar form 
below. An illustrative mitochondrial DNA chip of the inven- 
tion comprises the following probes (X, Y coordinates are 



shown, followed by the sequence; **DL3*' represents the 
acid comprising the known sequence, allowing mutations to 30 .3*-end of the probe, which is covalently attached to the chip 
be delected. As also noted above, there are advantages in surfaced) 



0 0 DL3AGTCGOGTATTT 

1 0 DUOGCnxnTAGTT 

2 0 DUTTAGTrrArcCAA 
J 0 DUATCCAAACCAGG 

4 0 DUACCAGGAXCGGA 

5 0 DUCXxTGTGTCTGTGG 

6 0 DUCGTGTGTGTOTGGC (SEQ ID. NO:32) 

7 0 DL3TCGTGTGTGTGTGG (SEQ ID. NO J J) 

8 0 DUGTAGGATGGGTC 

9 0 DUAGGATGGGTCGT 

10 0 DL3GArGGGTCGTGT 

11 0 DUTCGCGACGATTG 

12 0 DUGCGACGATrGGG 

13 0 DUTtiGGGGGGA 

14 0 DUGAGGGGGCG 

15 0 DL3GGAGGGGGCGA 

16 0 DL3GAGGGOGCCA 

0 1 DUGGCrrGGTTGG 

1 1 DUGGilOOiriOGG 

2 1 DUTGGGGrrrCTAG 

3 1 DLJGTTrCTAGTGGG 

4 1 DL3AGT30GGGGTCT 

5 1 DL3G0GGTGTCAAAr 

6 1 DUGTCAAATACATCG 

7 1 DUACATCOAATGOAG 

8 1 DUCGAATOGAGOAG 

9 1 DUGAGGACmrCCT 

10 1 D U 111 Oil lA TGTGA 

11 1 DUATOTGACrmAC 

12 1 DUGACmTACAAAr 

13 1 DLSAAATCnSCCCGA 

14 I DUAATCTCCCCGAG 

15 1 DL3CCCGAGTGTAGT 

16 1 DUAGTCTAGTGGCG 

0 2 DUGGGAGGGTCAG 

1 2 DUGOKfAGGGTATG 

2 2 DUGGTATGATOATTAG 

3 2 DUGATTAGAGTAAGT 

4 2 DLSTTAGAGTAAGTTA 



(SEQ ID. KO'26) 
(SEQ ID. N027) 
(SEQ ID. NO:28) 
(SEQ ID. NO:29) 
(SEQ ID. NO-JO) 
(SEQ ID. NO:31) 



(SEQ ID. NO:34) 
(SEQ ID. N0.35) 
(SEQ ID. tsO'JS) 
(SEQ ID. NO:37) 
(SEQ ID. NO:38) 



(SEQ ID. NO:39) 
(SEQ ID. NO:40) 
(SEQ ID. NO:41) 
(SEQ ID. NO:42) 
(SEQ ID. NO:43) 
(SEQ ID. NO:44) 
(SEQ ID. NO:45) 
(SEQ ID. N0:4tf) 
(SEQ ID. NO:47) 
(SEQ ID. NO:48) 
(SEQ ID. NO:49) 
(SEQ ID. NO:50) 
(SEQ ID. NO:51) 
(SEQ ID. NO:52) 
(SEQ ID. NO-J3) 
(SEQ ID. NO:543 
(SEQ ID. N0 J5) 
(SEQ ID. SO JS) 
(SEQ ID. NO J7) 
(SEQ ID. NO-J8) 
(SEQ ID. NO:59) 
(SEQ ID. NO:60) 
(SEQ ID. N0:61) 
(SEQ ID. Na62) 



9 2 DUGGTAGGATGGGT 

10 2 DUGGATGGGTCGTG 

11 2 DUGGTCGTGTGTGT 

12 2 DL3GTGTGTGTGGCG 

13 2 DUTGTGGCGACGAT 

14 2 DUGACCATTGGOGT 

15 2 DL3AITGGGGTArGG 

16 2 DUGTATGGGGCTTG 

0 3 DUGGATTGTGGTCG 

1 3 DUTGGTCGGArrGG 

2 3 DL3GGArrGGTCTAAA 

3 3 DUTCTAAAGTTTAAA 

4 3 DUGTrEAAAATAGAA 

5 3 DUATAGAAAAACCG 

6 3 DUAGAAAAACCXjC 

7 3 DL3AACCGCCATAC 

8 3 DL3CCATACGTGAAAA 

9 3 DL3ACGTGAAAATTGT 

10 3 DUAAITGTCAGTGGG 

11 3 DUtCTCAGTCGGGG 

12 3 DUTGGGGTTGA 

13 3 bL3GGGTrGArTGTCT 

14 3 DL3TTGTGTAAIAAAA 

15 3 DUAATAAAAGGGGA 

16 3 DL3TAAAAGGGGAGG 

0 4 DL3 01 i U 1 lA AAGG 

1 4 DUl I i iAAAGGTGO 

2 4 DUAGGTXK/TTTGG 

3 4 DL3TTGGGGGGGAG 

4 4 DL3GGAGGGGGCG 

5 4 DL3GGGGCGAAGAC 

6 4 DUGAAGACCGGATG 

7 4 DLJCCGGArGTCGTG 

8 4 DL3GTCGTGA Alli01 

9 4 DL3CGTOAATTTGTCT 

10 4 DUTTGTGTAGAGAOG 

11 4 DLnAGAGACGOTTT 

12 4 DUACGCrrrrGGGG 

13 4 DUTGGGGTmTGT 

14 4 DUGGGlllilGTIT 



(SEQ ID. NO:67) 
(SEQ ID. N0:6S) 
(SEQ ID. NO:69) 
(SEQ ID. NO:70) 
(SEQ ID. NO:71) 
(SEQ ID. NO:72) 
(SEQ ID. NO:73) 
(SEQ ID. NO:74) 
(SEQ ID. NO:75) 
(SEQ ID. NO:76) 
(SEQ ID. NO:77) 
(SEQ ID. NO:78) 
(SEQ ID. NO:79) 
(SEQ ID. NO:S0) 
(SEQ ID. NOiSl) 
(SEQ ID. NO:82) 
(SEQ ID. NO'.83) 
(SEQ ID. NO:84) 
(SEQ ID. NO:85) 
(SEQ ID. tiOJSS) 
(SEQ ID. NO:87) 
(SEQ ID. KO:88) 
(SEQ ID. Na89) 
(SEQ ID. NO:90) 
(SEQ ID. N0:91) 
(SEQ ID. NO:92) 
(SEQ ID. NO:93) 
(SEQ ID. N054) 
(SEQ ID. SOSS) 
(SEQ ID. N056) 
(SEQ ID. NO:97) 
(SEQ ID. NO:98) 
(SEQ ID. NO:99) 
(SEQ ID. NO:100) 
(SEQ ID. NaiOl) 
(SEQ ID. NO:102) 
(SEQ ID. NO:103) 
(SEQ ID. NO:104) 
(SEQ ID. NO:105) 
(SEQ ID. NO:106) 
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2 DU AAOT rATGTTGGG (SEQ ID. N0:&3) 
2 DUGTTGGGGGCG (SEQ ID. N0:6*) 

2 . pUGGGGGGGGTA .. (SEQ ID. NO:dS) 
2 . DUOCGGOTAGOAT (SEQ ID. N0:6d) 
5 DUACACAATTAArTAA . (SEQ ID. n6:1U) 
5 .DL3 AATrAATTACOAA (SEQ ID. NO:l J 2) 

5 DL3TACGAACATCCTC 

5 DUACXjAACATCCTXjT 
5 DUTCCrGTAnVOTA 
5 DUGTArMTAITGTr 
5 DUXTrGTrAAACTTA 
5 DUAAACTTACAGACG 
5 DUACAGACGTGTCG 
5 DUGTGTCGGTGAAA 
5 DL3GTGAAAGGTGTCT 
5 DUGGTGTCTCTGTAG 
5 DL3TGTCTCTCTAGTA 
DUGTAGTATTGrnT 
DUAGTATTG iliUr 
DUOCTCGTGGGATA 
DUTGGGATACAGCG 
DUGATACAGCGTCAT 
DUGCGTCATAGACAG 
DL3AGACAGAAACTAA 
DUCAGAAACTAAGGA 
DLnXAGGACGOAGT 
DUGACGGAGTAGGA 
DUGTAGGATAATAAA 
DLrTAATAAATAGCG 
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DUATAGCGTAGGAX 
DLHAGCGTAGGATG 
DUAGGATGCAAGTT 
DUAT3CAAGTTATAA 
DUGTTATAATGTCCG 
DL3ATCTCCGCITGT 
6. DLnXXX<nTGTAro 
7 DUGTGAGTGCCCrc 
7 DUTGCCCTCGAGAG 
7 DLSCCTCGAGAGGTVV 
7 DUAGAGGTACGTAA 
7 DUACGTAAACXATA 
7 DUACCATAAAAGCAG 
7 DUAAAGCAGACCX: 
7 DUAGACXCCCCAT 
7 DUCXICCCATACGT 
7 DLSCATACGTGCGCT 
7 DUGTGCGCTATCAG 
7 DUGCGCTATCAGTA 
7 DLSTCAGTAACXKnt: 
7 DUGTAACGCrCTGC 
10 DUAGTCTATCCCCA 
10 DUAICCCCAGGGA 
10 DUCAGGGAACTGGT 
10 DUACTGGTGGTAGG 
10 DUCTGGTGGTAGGA 
10 DUCTAGGAGGCACA 
10 DUGGCACAnTAGT 

10 DLmTAGTTAIAGGG 

11 DUAOGTTTACGGTG 
11 DL3TACGGTGGGGA 
11 DUGTGGGGAGTCG 
n DUGGGAGTGGGTGA 
11 DUGGGTGATCCrATG 
11 DUCCTArGGTronT 
11 DUGGTIGTrrGGATO 
11 DUGnTGGArGGGT 
11 DL3AIt>CGTGGGAAr 
II DLSGGGAArrGTCATG 
11 DUGTCArGTATCAnGT 
11 DL3TC:ArGTAnTCGG 
11 DUTAnrCGGTAAA 
11 DUrrcGGTAAATGG 
11 DUGTAAATGGCArGT 
11 DUGCATCnAArOGTG 

11 DUGTAArCGTGTAAT 

12 DUGGGAGGGGTAC 
12 DUGGGTACGAATGT 
12 DUACGAATGTTCGTT 
12 DLJTGTrCGTrCATGT 
12 DUCGTTCArGTCGTr 



(SEQ ID. N0:1I3) 
(SEQ ID. N0aj4) 
(SEQ ID. K0:n5) 
(SEQID. N0:116) 
(SEQ ID. N0:117) 
(SEQ ID. N0:113) 
(SEQ ID. N0:119) 
(SEQ ID. NO:i:0) 
(SEQ ID. NO:12l) 
^EQID. NO:]22) 
(SEQ ID. N0:12J) 
(SEQ ID. NO:124) 
(SEQ ID, NO:125) 
(SEQ ID. NO:] 26) 
(SEQ ID. NO:127) 
^EQ ID. NO:123) 
(SEQ ID. NOa29) 
(SEQ ID. NO:130) 
(SEQ ID. N0:13J) 
(SEQ ID. NO:l32) 
(SEQ ID. NO:133) 
(SEQ ID. NO:134) 
(SEQ ID. NO:135) 
(SEQ ID. tsO:136) 
(SEQ ID. N0:137) 
(SEQ ID. N0:U3) 
(SEQ ID. NO:139) 
(SEQ ID. NO:140) 
(SEQ ID. N0:141) 
CSEQ ID. NOa42) 
(SEQ ID. NO:143) 
(SEQ ID. N0:144) 
(SEQ ID. NO:145) 
(SEQ ID. NO:146) 
(SEQ ID. NO:147) 
(SEQ ID. N-0:143) 
(SEQ ID. N'0:149) 
(SEQ ID. NO:150) 
(SEQ ID. N0:15l) 
(SEQ ID. NO:152) 
(SEQ ID. KOilS}) 
(SEQ CD. N-0:154) 
(SEQ ID. Nai55) 
(SEQ ID. NO:156) 
(SEQ ID. NO:203) 
(SEQ ID. NO-.204) 
(SEQ ID. NO:205) 
(SEQ ID. NO:206) 
(SEQ ID. NO-.207) 
(SEQ ID. NOJ08) 
(SEQ ID. NO J09) 
(SEQ ID. NO-.210} 
(SEQ ID. NO-.211) 
(SEQ ID. NO-J12) 
(SEQ ID. NO:2J3) 
(SEQ ID. NO-J14) 
(SEQ ID. NO-J15) 
^EQ ID, NO:216) 
CSEQ ID. NO:217) 
^EQ ID. NO-J18) 
(SEQ ID. N0:219) 
(SEQ ID. NO:220) 
(SEQ ID. N0221) 
(SEQ ID. SO:222) 
(SEQ ID. NO-.223) 
(SEQ ID. NO-.224) 
(SEQ ID, N0225) 
(SEQ ID. NO-.226) 
(SEQ ID. NO-^7) 
(SEQ ID. N0228) 
(SEQ ID. NO-.229) 
(SEQ ID. NO-.230) 
(SEQ ID. NO:231) 
(SEQ ID. NO:232) 
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15 4 DUTTG l i i X Ji iO GG 

16 4 DLSTCTTGGGArrGTC 

0 5 DUrGTATGAATGAnr 

1 5 DUTGAmCACACAA 

. 14 7 • DucrcrccGACcrc ■ 

15 7 DL3GAGCTCXKKXT 

16 7 DUrCGGCCTCGTC 

0 8 DUGATGAAGTCCCAG 

1 8 DUAGTCCCAGTATTT 

2 8 DUGTAnTCGGATTT 

3 8 DUTCGGATTTArCG 

4 8 DLSGATITArCGGGT 

5 8 DLJArCGGGTGTOCA 

6 8 DUTGTGCAAGGGGA 

7 8 DLJCAAGGGGAAnr 

8 8 DUGAArnVOTCrG'DV 

9 8 DUPCTGTAGTGCTAC 

10 8 DL3GTAGTGCrACCr 

11 8 DLJGCTACCTAGTAG 

12 8 DUCTAGTAGTCCAGA 

13 8 DUTCCAGArA9TGGG 

14 8 DUAGATACIGGGATA 

15 8 DUGGGArAArrcOT 

16 8 DUTAArrCGTGAGTO 

0 9 DUTATAGGGCGTCf 

1 9 DL?GGGCGTOTTCTCA 

2 9 DUGTGTTCrCACGAT 

3 9 DUTCACGATGAGAGG 

4 9 DUATGAGAGGAGCG 

5 9 DUAOGAGCGAGGC 

6 9 DLJCGAGGCCCXK/ 

7 9 DDGCCCGGGTATT 

8 9 DUCGGGTATTGTCA 

9 9 DUGTGAACCCCCAT 

10 9 DLSCCCCATCGATTT 

11 9 . DUATCGAnTCACTT 

12 9 DUrrrCACTTGACAr 

13 9 DUTTGACATAGAGCT 

14 9 DUTAGAGCTGTAGAC 

15 9 DL3GTAGACCAAGGA 

16 9 DUACCAACGAIXrAAG 

0 10 DLSCGTGTAATGTCAG 

1 10 DL3TX>TCAGTrrAGGG 
10 DUrcAGTITAGGGA 
10 DL3TAGGGAAGAGCA 
10 DUAAGAGCAGGGGT 
10 DUCAGGGGTACCTA 
10 DLJGGTACCTACIGG 
10 DLJTACTGGGGGGA 
10 DUGGGGGAGTCTAT 
13 DUCAIGTAl 1 i i iGG 
13 DUmrGGCTTAGG 
13 DUGGGTTAGGAIGT 
13 DUGGATGTAOTnTG 
13 DUTCTAGTnrrOGG 

13 DUnTGGGGGAGG 

14 DUGGGTrCATAACTG 
14 . DUATAACrGAGTGGG 
14 DUAACTGAGTGGGT- 
14 DUGTGGCTAGTrCT 
14 DUCTAGriGTTGGC 
14 DLKTrrOGCGATACA 
14 DUCGATACATAAAAG 
14 DLTTAAAAGCAFGTAA 
14 DLSGCArOTAAIGACG 
14 DUATOACGGTCCGT 
14 DUGTCGGTCGTACT 

14 DUGGTACrrATAACA 

15 DLJrCGArrCTAAGAr 
15 DLJrAAGArTAAArrr 
15 DUAAAnTGAATAAG 
15 DUAATAAGAGACAAG 
15 DUAAGAGACAAGAAA (SEQ ID. NO:268j 
15 DUAAGAAAOTACCC (SEQ ID. NO-.269) 
15 DUAAAOTAOCCCrr (SEQ ID. NOa70) 
15 DL3CCCCnCGTCrA (SEQ CD. NO:271) 
15 DLXIILOiLIAAAC (SEQ ID. NO:272) 
15 DUCTAAACCCATCG (SEQ ID. KO-.273; 
15 DUAACCCATGGTGG (SEQ ID. KO:274) 
15 DUTOGTGGOTrCAT (SEQ ID. NO:275) 



(SEQ ID. NO:107) 
(SEQ ID. NO:108) 
(SEQ ID. NO:109) 
(SEQCD.NO.-llO) 
.(SEQID.NO:157) 
(SEQ ED.'NO:15S) 
(SEQ ED. NO:159) 
(SEQ ID. NO:160) 
(SEQ ID. N0:16l) 
(SEQ ID. NO:162) 
(SEQ ID. NO:163) 
(SEQ ID. NO:164) 
(SEQ ED. NO:165) 
(SEQ ID. NO:166) 
(SEQ ID. NO:167) 
(SEQ ED. N0:16S) 
(SEQ ID. NO:169) 
(SEQ ID. NO:170) 
(SEQ ID. N0:17l) 
(SEQ ID. NO:172) 
(SEQ ID. NO;173) 
(SEQ ED. NO:174) 
(SEQ ED, NO:175) 
(SEQ ID. NO:176) 
(SEQ ID. NO:177) 
(SEQ ID. NO:178; 
(SEQ lb. NO:179) 
(SEQ ED. NO:180) 
(SEQ ED. N0:18l) 
(SEQ ED. NO:182) 
(SEQ ED. NO:183) 
(SEQ ID. NO:184) 
(SEQ ID. N0:1S5) 
(SEQ ID. NO:186) 
(SEQ lb. NO:187) 
(SEQID.NO:18S). - 
(SEQ ID. NO:189) 
(SEQ ED. NO:190) 
(SEQ ID. N0:19l) 
(SEQ ID. NO:192) 
(SEQ ED. NO:193) 
(SEQ ED. NO:194) 
(SEQ ED. NO:195) 
(SEQ ID. NO:196) 
(SEQ ED. NO:197) 
(SEQ ED. NO:198) 
(SEQ ED. NO:199) 
(SEQ ID. NO:200) 
(SEQ ED. NO:201) 
(SEQ ED. NO-.202) 
(SEQ ED. NO:24^ 
(SEQ ED. NO:247) 
(SEQ ED. NO:243) 
(SEQ ID. NO:249) 
(SEQ ED. NO:250) 
(SEQ ED. NO:25l) 
(SEQ ED. NO:252) 
(SEQ ED, NO:253) 
(SEQ ED. NO:254) 
(SEQ ED. NO:255) 
(SEQ ED. NO:256) 
(SEQ ID, NO:257) 
(SEQ CD. NO:25a) 
(SEQ ED. NO:259) 
(SEQ ID. NO:260) 
(SEQ ID. NO:26l) 
(SEQ ID. NO:262) 
(SEQ ID. NO:2d3) 
(SEQ CD. KO-.2tf4) 

(SEQ CD. mass) 

(SEQ ID. NO:266) 
(SEQ ID. NO-.267) 
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-COQtiflUed 



10 12 DL30TCX>TX\GTTGG - 

11 12 DL3TAGTTGGGAGrr 

12 12 DUGGAGTrcATAGTO 

13 12 DUArAOTOTXHACrr 

. 14 12 DUGTFIAGTrGACXfT - 
. 15 . 12. bLSTGACGTTGAGGT 
16 12 DUOnTGAGGTrrA 

5 13 DUTATAACArGCCAT 

6 13 DUAACATGCCATGGT 

7 13 DUCCATGGTAITAr 

8 13 DUAnTATGAACTGG 

9 13 DUAACTGGTGOACAT 

10 13 DLTTGGACArCArGTA 



(SEO ID, NO:233) 5 16 DLaTTGGAAAAAGGT (SEQ ID. NO:276) 

(SEO ID. KO:234) 6 16 DUAAAAGGTTCCTG (SEQ ID. NO:277) 

(SEQ ID. NCh235) 7 16 DUGGTTCCTGrnA (SEQ ID. NO:27S) 

(SEQ ID. NOa3d) 8 16 DL3C aoi i iAGTCTC (SEQ ID. NO:279) 

(SEQ ID..NO:237) 9 16 . DL3TtA 0iaCi lil i (SEQ ID. Nd:2S0) 

(SEQ ID. NOajS) . 10 16 DL3 <,H i i IC AGAAAT (SEQ ID. NO:281) 

(SEQ ID, NO:239) 11 16 DUAGAAAITGAGGTG (SEQ ID. N0382) 

(SEQ ID. NO:240) 12 16 DUAAAITGAGGTGar (SEQ ID. NO:2S3) 

PEQ ID. N*0:24l) 13 16 DLSGGTGGTAATCGT (SEQ ID. NO:2S4) 

(SEQ ID. KO:242) 14 16 DUTAATCCTGOGTr (SEQ ID, NO:2S5) 

(SEQ ID. NO:243) 15 16 DUGTGGGnTCOAT (SEQ ID. NO:286) 

(SEQ ID. NO:244} 16 16 DUGGTrTCGAITCr (SEQ ID. NO:2S7) 
(SEQ ID. KOa45) 



No Pfo^wcfc present m positions X, Y«0, 12 to X, y-4, and in several cases, the differences were within noise levels. 

V n ic V V °* c v'^' ^ ^"^J ^' Improvements can be realized by increasing the amount of 

V ' u ^? u ^ ^ ' ' '° ^' overlap between probes and hence ovcraU probe density 
of each of the probes on the chip was variable to minimize and, for duplex DNA targets, using a second set of probes 
differences m melting temperature and potential for cross- either on the same or a separate chip, coaesponding to the 
hybnduation. Each position in the sequence is represented ^0 second strand of the target. FIG. 14, in sheets 1 and 2, shows 
by at least one probe and most positions are represented by a plot of normalized intensia'es across rows 10 and 11 of the 
2 Of more probes. As noted above, the amount of overlap array and a tabulation of ihe mutau'ons detected, 
between the oligonucleotides varies from probe to probe. FIG, 15 shows the discrimination between wild-typc and 
FIG. 9 shows the human mitochondrial genome; "0^" is the mutant hybrids obtained with this chip. The median of the 
H strand origin of repUcatioo, and arrows indicate the cloned 25 six normaUzcd hybridization scores for each probe was 
" . taiccn. The graph plots the rauo of the median score to the 

DNA was prepared from hair roots of six human donors normalized hybridization score versus mean counts. On this 
(mtl to mt6) and then amplified by PGR and cloned into graph, a ratio of 1.6 and mean counts above 50 yield no false 
M13; the resulting clones were sequenced using chain positives.andwhDe it is clear that detection of some mutants 
terminators to verify that the desired specific sequences were . 30 can be improved, excellent discrimination is achieved, con- 
present. DNA from the sequenced M13 clones was amplified sidcring the small size of the aaay. FIG. 16 illustrated how 
by PGR; transcribed in vitro, and labeled with fluorescein- identity of the base mismatch may influence the ability 

UTP using T3 RNA polymerase. The 1.3 kb RNA transcripts '° discricninatc mutant and wild-type sequences more than 
were fragmented and hybridized to the chip. The results position of the mismatch within an oUgonucleoU'de 

showed that each different individual had DNA that pro- 35 mismatch position is expressed as % of probe 

duced a unique hybridization fingerprint on the chip and that ^^^°SC is indicated on the 

the differences in the observed patterns could be correlated ^"^P^: ^i^^ '"''^'^ ^^j""^ ^^^P increase the 

with differences in the cloned genomic DNA sequence. The "P^*^!^^ siznd^rd reverse dot blot format by orders of 
results also demonstrated that very long sequences of a ""^gf ""fe, cxtcndm^ 

f^ircTPf ni.r-i*.,v k . J ^ a«jucu(.w ui a ^^^j jjj^j jjj^ methods of the mvcnUon arc more efficient and 

^^^^T^Z ? • comprehensively as a 40 ,han gel4)ased methods of nucleic acid 

specific set of overlappmg oligonucleotides and that arrays s^u^ce and mutation ^alysis. 

of such probe sets can be usefuUy appUed to genetic analy- advantages becomi more apparent as chips with 

. more and more probes are employed. To illustrate, the 

The sample nucleic acid was hybridized to Ihe chip in a present invenUon provides a DNA chip for analyzing human 
solution composed of 6xSSPE, 0.1% Triton-X 100 for 60 45 mitochondrial DNA (mtDNA) that "tiles'* through 646 
minutes at 15' C. The chip was then scanned by confocal nucleotides of human H strand mtDNA from positions 
scanning fluorescence microscopy. The individual features 162fi0 to 356. The probes in the array arc 15 nucleotides in 
on the chip were 588x588 microns, but the lower left 5x5 length, and each position in the target sequence is rcprc- 
squarc features in the array did not contain probes. To scnlcd by a set of 4 probes (A, C, G,T substitutions), which 
quantitatc the data, pixel counts were measured within each . 50 differed from one another at position 7 from the 3'-cnd. The 
synthesis site. Pixels represent 50x50 microns. The fiuorcs- ^^^y consists of 13 blocks of 4x50pr6bcs: each block scans . 
ccncc intensity for each feature was scaled to a mean tfawugh 50 nuclcotidesofconliguous mtDNA sequence. The 
determined from 27 bright features. After scanning, the chip ^^oc^ are separated by blank rows. The 4 corner columns 
was stripped and rchybridized; aU six samples were hybrid- <^oa[zlD control probes; there arc a total of 2600 probes in a 
ized to the same chip. FIG. 10 shows the image observed 55 J^^^*?^ (feature), and each area is 

from Ihe mt4 sample on the DNA chip. HG. 11 shows the j'llllT'^J?^: , a ^ u n^n 

image observed from the mt5 sample on the DNAchip. RG. ^^l^'^f^^^^ Tu^!"^"""^ r^r.^^ 

12 shows the predicted differenci image between the mt4 S?^^^^^^^ 

S^n 7^^7 V f""* ?oo'!f:?!^\''.^""'' ^""l '° P^^'^ tagged "with 13 and T7 RNA 

Anderson et al., 19Si,Namre 290: 457-465, mcotporated polymerasepromotersequcnc«andin vitro transcription to 
licrein by reference). HG. 13 shows the actual difference produce fluoresccin-UTP labeled RNA. The RNA was frag- 
image observed. mcntcd and hybridized to the oligoouclcolidc array in a 

The results show that, in almost all cases, mismatched solution composed of 6xSSP£, 0.1% Triton X-100 for 60 
probc/taigcl hybrids resulted in lower fluorescence intensity 65 minutes at 18' C. Unhybridizcd material was washed away 
than perfectly matched hybrids. Nonetheless, some probes with buffer, and the chip was scanned at 25 micron pixel 
detected mutations (or specific sequences) better than others, resolution. 
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^n^^:^Lfn^?^!^ ' ^' "^^^ ^'S'^ ^"^"^^^^ particular mutation in p53 and the functiooing 

lTJ^!^^tlf^^t^^^ ^"^^^S Furthermore, there are proj^cl 

.h! n M rH J"-"" ^'^'^'^v niismaiched with looking at the germhne inheritance of p53 mutations and the 

the probe at the designated site) are ,n bold. FIG. 18 shows development of cancer. Tlic present invention provides 
Oie fluorescence image produced by scanning the chip when 5 • useful DNA chips and melhodk for such studies ' " - 
• hi^ '° this samplc About 95.%.of the sequence could -V la addition, the present invention als6 provides a diafi- 

- be read correctly &om only one strand of the briginal duplex noilic test kit and method add p53 probes immobilized on a 
targe nucleic acid. Although some probes did not provide DNA chip in an orgamzed array. CurrenUy available diag- 
cxccUent discnmination and some probes did not appear to nostic tests for cancer lypicaUy have a sensitivity of about 
hybndize to the target cffidcnlly, excellent results were lo 509&. Tlie present invention provides significant advantages 
achieved. Tlie target sequence differed from the probe set at over such tests, and in one embodiment provides a method 
sue posiaons: 4 transmons and 2 insertions. All 4 transitions for detecting cancer-causing mutaUons in p53 that involves 
were detected and specific probes could readQy be incor- the steps of (1) obtaim'ng a biofisy, which is optionally 
porated mto the array to detect insertions or deletions. FIG, fractionated by ciyostat secdoning to enrich tumor cells to 
iV UJustrates the detection of 4 transitions in the target is about80?&of the total cell population. The DNA or RNA is 
sequence relative to the wild-type probes on the chip. then extracted, amplified, and analyzed with a DNAchip for 

'Tu ^"^llr. 1!^^^*' sequences can be read the presence of p53 mutations correlated with malignancy 

using the DNA chips and methods of the invention, as To illustrate the value of the DNA chips of the present 
compared to conventional sequencing methods, where read- invention in such a method, a DNAchip w^ synthesized by 
mg length IS lunitcd by the resolution of gel electrophoresis. 20 the VLSfPS™ method to provide an array of ovcrlappine 
Similar results were observed when genomic DNA samples probes which represent or tile across a 60 base region of 
were prepared from human hair roots. Hybridization and cxon 6 of the p53 gene. To demonstrate the ability to detect'' 
signal detection require less than an hour and can be readily substitution mutations in the target, twelve different single 
shortened by appropnate choice of buffers, temperatures, substitution mutations (wfld type and three different subsd- 
probes, and reagents. Inprinciple,longerscqucncercadscan 25 tutions at each of three positions) were represented on the 
be obumcd than by conventional sequencing, where reading chip along with the wild type. Each of these mutations was 
length IS limited by the resolution of gel electrophoresis. represented by a series of twelve 12-mer oUeonucIeotide 
P53 Sequcncmg and Diagnostic DNA Chips probes, which were complementary lo the wild type target 

IS a tumor suppressor gene that has been found to be except at the one substituted base. Each of the twelve probes 
mutated in most forms of cancer (see Levine et al, 1991, .30. was complementary to a different region of the target and 
o<^Za^^^ 453-456, and Hollstcin ct al., 1991, Science contained the mutated base at a different position, c.g if the 
253; 4SU53 each , of which is incorporated herein by substitution was at base 32. the set of probes would be 
rctcrence;. In addition, there is a hereditary syndrome, complemenlary-wiih the exception of base 32— regions 
U-Fraumeni, m which individuals inherit mutant alleles of of the target 21-32, 22-33, and 32-43). Tliis enabled invcs- 
p53 and tend to have cancer at relatively young ages 35 UgaUon of the effect of the substitution position within the 
(Frebourg et al.. 1992. PiVAS 89: 6413-^5417, incorporated probe. Tlic alignment of some of the probes with a 12.mer 
herein by reference). During the development of a cancer, model target nucleic acid is shown in FIG 20 
p53 IS macuvatcd. The course of p53 inaclivation gencraUy To demonstrate the effect of probe length, an additional 
involves a mutaUon m one copy of p53 and is often followed series of ten 10-mer probes was included for each mutation 
by deletion of the other copy. After p53 is inactivated, 40 (see HG. 21). In the vicinity of the substituted positions, the 
chromosomal abnormaliUes begin to appear in tumors. In wild-type sequence was represented by every possible over- 
'onn of cancer, colorectal cancer, well lapping 12-mcr and lO-mer probe. To simplify comparisons 
over 50?&, perhaps 80%, of all patients with tumors have p53 the probes corresponding to each varied position * were 
mutations. In addition. p53 mutations have been found in a arranged on the chip in the rectangular regions with the 
high proportion of lung, breast, and other tumors (Rodrigucs 45 following structure: each row of cells represents one 
ct aL, 1990, PNAS 87: 7555-7559, incorporated herein by substitution, with the top row represendng the wild type 
liable ^"^^"^"^^ *° ^^'^ presented by David Sidransky Each column contains probes complementary to the same 
(1992San Diego Conference), over 400 mutations in p53 are region of the target, with probes complementary to the 
Tv°* « 3'-CQd of the target on the left and probes complementary to 

in r u- 1?^°^ ^ humans and has 11 cxbiis. 50 the 5*-c.nd of the target on the right The difference between 

10 of which are protein coding (sec Tominaga et al., 1992, two adjacent columns is a single base shift in the positiooing 
Critical Reviews m Oncogenesis 3: 257-282, incorporated of the probes. Whenever possible, the series of 10-mer 
herein by reference). The gene produces a 53 kilodalton probes were placed in four rows immediately underneath 
phosphoprotem that regulates DNA replication. The protein and aUgncd with the 4 rows of 12-mer probes for the same 
acts to halt replication at the Gl/S boundary in the cell cycle 55 mutation. 

and is believed to act as a "molecular poUccman," shutting To provide model Urgcts. 5* fluoresceinated 12-mers 
down rcphcation when the DNA is damaged or blocking the conuining all possible substitutions in the first position of 
reproduction of DNA viniscs (sec Lane, 1992, Nature 358: codon 192 were synthesized (see the sUrred position in the 
15-16, incorporated herein by reference). Tliere is substan- target in HG. 20). Solutions contaim'ng 10 nM target DNA 
tial interest m the cancer research community in analyzing 60 in 6^?E, 0.25% Triton X-100 were hybridized to the chip 
p53 mutations. The NCI is currently funding contracts to at room temperature for several hours. While Urgel nucleic 
charactenzc the p53 muUtion spectra caused by various was hybridized to the chip, the fluorophorcson the chip were 
caranogeos. In addition, there arc research projects which excited by Lght from an argon laser, and the chip was 
involve sequencing p53 from spontaneously arising tumors. scanned with an autofocusing coofocal microscope. The 
A major resource in these studies is the huge supply of 65 emitted signals were processed by a PC to produce an image 
biopsy material stored in paraffin blocks. Also, there are using image analysis software. By 1 to 3 hours, the signal 
projects which arc aimed at analyzing the relationship had reached a plateau; to remove the hybridized' target and 
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^^S*^*' ^^'^ stripped For scqucocing, the p53 DKA can be cloned from the 

with 60% formamidc, 2xSSPE at IT* C for 5 minutes. The sample or directly amplified from genomic DNA by PCR If 
washing buffer and temperature can vary, but the buffer genomic PCR is used, then the DNA can be diluted prior to 
typically contains 2-lo-3xSSPE, lO-to-60% formamide (one amplification so that a single copy of the gene is amplified, 
. can use multiple washes, increasing the formamide conceo- 5 For diagnostic purposes,* the genomic DNA can be isolated . 
: tration by 10% each wash, and scanning between washes to . from a tumor biopsy in which the tunoor ceils may be the 
determine when the wash is complete), and optibnaUy a* majority population. As noted above, the proportion of 
small percentage of Triton X-100, and the temperature is tumor cells in a sample can be enriched by cryostat scction- 
typicaUy in the range of 15' to 18** C. ing. DNA can also be isolated and amplified from tumor 

Very distinct patterns were observed after hybridization lO samples stored in paraffin blocks, 
with targets with 1 base substitutions and visualization with The p53 DNA in the sample can be amplified by PCR 
a confocal microscope and software analysis, as shown in (although other amplification methods can be used) using 
FIG. 22. In general, the probes which form perfect matches 3-4 prim'cr pairs generau'ng amplicons of <3 kbp each, 
with the target retain the highest signal. For example, in the Illustrative primers of the invention for amplifying cxon 5 of 
first image in Figure PC, the 12-mer probes that form perfect 15 the p53 gene arc shown below (B is biotin; F is fluorescein) 
matches with the wild-type (WT) target are in the first row 5'-B-CACTTGTGCCCTGACnTCAAC-3'(SEQ. ID 
(top). The 12-mcr probes with single base mismatches are NO:288) 

located in the second, third, and fourth rows and have much 5'-F-CACTrGTGCCCTGACTrrCAAC-3' 
lower signals. Hie data is also depicted graphicaUy in HG. . 5'-ATGCAATTAACCCTCACTAAAGGG AG ACACTTG- 
23. On each graph, the X ordinate is the position of the probe 20 TGCCCTGACnTCAAC-3*(SEQ. ID NO;289) (has T3 
in its row on the chip, and the Y ordinate is the signal at that promoter) . 

probe site after hybridization. 5'-B-GACCCTGGGCAACCAGCCCTGTCGT.3'(SEQ. Tti 

When a target with a different one base substitution is NO:290) 
hybridized the complementary set of probes has the highest 5'-F-GACCCTGGGCAACCAGCCCTGTCGT.3' 
signal (see pictures 2, 3. and 4 in FIG. 22 and graphs 2, 3, 25 5'-TAATACGACTCACTATAGGGAGGACCCTGGGCA- 
and 4 in FIG. 23). In each case, the probe set with no ACCAGCCCTGTCGT-3XSEQ. ID NO:291) (has 13 
mismatches with the target has the highest signals. Within a promoter) 

12-mer probe set, the signal was highest at position 6 or 7. After PCR ampUfication of the target (the amplified target is 
The graphs show that the signal difference between 12-mer called the "amplicon") one strand of the amplicon can then 
. probes at the same X ordinate tended to be greatest at Jo be isolated, i.c;, using a biotinylated primer that allows 
positions 5 and 8 when the Urget.and the complementary capture of the undesircd strand on streptavidin beads; , 
probes formed 10 base pairs and 11 base pairs, respectively. Alternatively, asymmetric PCR can be used to generate a 
Because tumors often have both \VT and mutant p53 genes, single-stranded target. Another approach involves the gen- 
mixed target populations were also hybridized to the chip, as craUon of single stranded RNA form the PCR product by 
shown in FIG. 24. When the hybridization solution consisted 35 incorporating a 77 or other RNA polymerase promoter in 
of a 1:1 mixture of WT 12-mcr and a 12-mer with a one ofthe primers. The single-stranded material can option- 
substitution in position 7 of the target, the sets of probes that ally be fragmented to generate smaller nucleic acids with 
were perfectly matched to both targets showed higher sig- less significant secondary structure than longer nucleic 
nals than the other probe sets. acids. 

The hybridization efficiency of a 10-mer probe array as 40 In one such method, fragmentation is combined with 
compared to a 12-mer probe array was also compared. The labeling. To illustrate, degenerate 8-mers or other degenerate 
10-mcr and 12-mer probe arrays gave comparable signals short oligonucleotides are hybridized to the single -stranded 
(sec graphs 1^ in FIG. 23 and graphs 1-4 in FIG. 25). Urget material. In the next step, a DNA polymerase is added 
However, the 10-mer probe sets, which arc in rows 5-S (sec with the four different dideoxy nucleotides, each labeled with 
images in FIG. 22), seemed to be better in this model system 45 a different fluorophore. Fluorophorc-labeled didcoxynuclc- 
than the 12-mer probe sets at resolving one target from otide are available from a variety of commercial suppliers, 
another, consistent with the expectation that one base mis- such as ABI. Hybridized 8-mers are extended by a labeled' 
matches arc more destabilizing for 10-mers than 12-mers. dideoxynuclcolide. After an optional purification step, Lc., 
Hybridization results within probe sets perfectly matched to . with a size exclusion column, the labeled 9-mers arc hybrid- 
Urget also followed the cxpecution that, the more matches 50 izcd to the chip. Other methods of target fragmentation can 
the individual probe formed with the target, the higher the be employed. The single-stranded DNA can be fragmented 
signal. However, duplexes with two 3' dangles (see FIG, 23, by partial degradation with a DNAse or partial depurination 
position 6 in graphs 1-4) have about as much signal as the with acid. Labeling can be accomplished in a separate step, 
probes which are matched along their entire length (see FIG. i.c., fluorophore -labeled nucleotides are incorporated before 
23, position 7, in graphs 1-4). 55 the fragmentation step or a DNA binding fluorophore, such 

This illustrative model system shows that 12-mcr targets as cthidium homodimer. Is attached to the target after 
that differ by one base substitutions can be readily distin- fragmentation. 

guished from one another by the novel probe array provided In one embodiment, the DNA chip has an array of 10* to 
by the invention and that resolution of the different 12-mcr lO' probes tiling across the protein coding regions of p53, 
targets was somewhat better with the 10-mcr probe sets than 60 which comprise about 1200 bp; smaller arrays specific for 
with the 12-mcr probe sets. The value of having several the 600 bp mutational hot spot region arc also useful The 
overlapping probes hybridizing to a Urget demonstrates the probes overlap for N-2 to N-4 bases, where N is the length 
value of the multiple hybridization events that take place on of the probe in bases. N is typically 10 to 14 bases long, but 
a DNA chip of the invention. The results also demonstrate as will be seen below, probes 15 to 19 bases and longer arc 
the fcasibUity of conslructbg a probe set to sequence the 65 also useful. Every possible single base subsU'tution occur- 
cntire 1,4 kbp protein coding region of p53 or alternatively ring one at a time is represented in the array. The number of 
the 0.6 kbp of exons 5-9 conUining mutation hot spots. unique 10-mer probes with 7 base overiaps would be about 
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(1200/3)x4xl0 or about 1.6x10*. To aUow 3 repUcates of of DNA. First, the Uiget DNA is ampUfied by PCR with 
each probe, one might have a total array size oa the order of primers allowing easy ligation into a vector, which is taken 
4.8x10* probes. Of course, arrays of probes within the up by transfonnilioo of E. colt which in turn must be 
ranges of 10" to 10* probes are also useful for applications; cultured, typically on plates overnight. After growth of the 
. for example, very large arrays of 10* or more probes are 5 bacteria, DNA is purified in a procedure that typically takes 
. ; : useful for sequencing or sequence checking laige genomic about 2 hours; then, the sequendng reactions are performed, . 
DNA fragments. Optionally fragmented and labeled target which takes at least another hour, and the siamples are run on 
nucleic acid hybridized to the chip is detected by a confocal for several hours, the duration depending on the 

microscope or other imaging device. The pattern of sites ''"8"' .^f^S"'"" be sequenced. By contrast, the 
"Ughu-ng up" with Urget is preferably analyzed with com- JO P««ot mvenUooproWdes direct analysis of the PCR ampU- 
puter assistance to provide the sequence of the target from matenal after brief transcripUon and fragmentation 

the pattern of sites producing signals. ' ."^"^ '"f* ^}'°'- 

The invention is illustrated below with examples of DNA mterestmg dmrcal application for the characterization 

chips comprising very large arrays of DNAprobes to "rese- DNA chips is as foUows. 

quence" p53 target nucleic acid in a sample. To analyze is 'pf'V'duals with germhne cancer mutations have a very high 
DNA from exon 5 of the p53 tumor suppressor gene, a set ,J!f M'" '^"tmenl by irradiation, 

of overlapping 17-mer probes was syniesized on a chip. ^""L P"""'* ""^y ^^"^ g'™""* 'n"'^- 

TTie probes for the WI allele were synthesized so as to tile ^°'^J°' tumor suppressor genes. TTius. before 

across the entire cxon with single base overlaps between '^'^'^^ ""/^tfTT- , *.P''y*'f"° 
probes. For each WI probe, a sets of 4 additional probes. 20 ^^'^^ » 

one for each possible base substitution at position 7. were «r™^.*"PP'T,- S^^^"'*'**"'-. 
synthesized and placed in a column relaUVe to the WTprobe. DNA Chips for Rational llerapeutic Management 
Exon 5 DNA was amplified by PCR with primers flanking , P««='« >°Y'.°"*"' ^ P"'^'''" 
the exon. One of the primers was labeled with fluorescein' ^'^ Pty«c'»'^ dctermme optunum therapeutic 

the other primer was labeled with biotin. After amplification. 25 P'^""''* "Pid detection of biologically mediated 

the biotinylated strand was removed by binding to strepta- '« alherapeuUc agent m a vanety of disease states, 

vidin beads. The fluoresceioated strand was used in hybrid- DNA chip are many, as the chips will 

iinioa "''P physicians recognize health care cost savings, achieve 

About 'A of the amplified, single-stranded nucleic acid "P"** ""'"P*"'!*: benefits, bmil administration of ineffective 
was hybridized ovcrm-ght in SxSSPE at 60* C. to the probe 30 '° resistance) yet toxic drugs, monitor changes in 
chip (under a cover slip). After washing with 6xSSPE. the • P*'i^8" ^eaeasc pat{.ogen acquisition of 

chipwasscanned using confocal microscopy. FIG. 26 shows .re^aUnce.. Important appbcauons^dude the treatment of • 
an image of the p53 chip hybridized to L taiget DNA. HI\^ other mfecuous diseases, and cancer. 
Analysis of the intensity data showed that 93.5% of the 184 ^Y*"^. * hrgc^nd expandmg number of people, 

bases of exon 5 were caUed in agreement with the WT 35 '"Vl!^! ''""5 HIV can 

sequence (see Buchman et al., 1988. Gene 70: 245-252. «P««ly become resistant to drugs used to treat the mfeclion. 
incorporated herein by reference). The miscalled bases were l^'^''^l^Tr.^^V'''°'' . heterodimeric protein (51 
from positions where probe signal intensities were tied ^ ^^^^ HIV reverse transcriptase (RT) encoded by 
(1.6%) and where non-WT probes had the highest signal "* P?" .'^ T ^" 

intensity (4.9%). FIG. 27 illustrates howthe actual sequence 40 "'/JJ^P"'"" ^°'?^^,T.'A'"'"i 
was .cad. Gaps in the sequence of letters in the WT rows f i?'^' ""'=''«'f ' f°'^'^!f: 'f' f 
correspond to control probes or sites. Positions at which ^^^"'^V '» HI V infecuon are converted to 

bases are miscaUed are represented by letters in iulic type in ""f'"'"'' "/!°f " ^'^ ^."l""!!'' PhosphorylaUon m the 
cells corresponding to pribes in which the WT bases have ^f^'^. f f^^^, «"f- *h«re incoqwraton of the 
been substituted by other bases. 4S "V"*".' "^"J """^"^ termmation of viral 

As the diagram indicates, the miscaUed bases are from the «Pl|«"<'°. b«:a«se the 5'-3;ptosphodiester linkage can- 
low intensity areas of the image, which may be due to »ot be completed However, withm af er 6 m^ 
secondary structure in the Urget or probes preventing inter- ?^ '>P'f' ^« ^ f"' « »f' *> 
molecular hybridization. To diaunish the effects due to ""'fPf * of mcorporatrng the analogue and so 

secondary stnicmre. one can employ shorter targets (i.e.. by SO f","*^",' Several known inutaUons are shown 

target , fragmentation) or use more stringent hybridization w uouiar torm Deiow. 

conditions. In addition, the use of a set of probes synthesized • 

by tiling across the other strand of a duplex target can also rt Mtn-AnoNS associated wrm drug resistance 

provide sequence information buned in secondary structure 
in the other strand. It should be appredated. however, that SS ANTT- 

the pattern of low intensity areas that fotms as a result of vQAL CODON «iCHa.\OE ntCHAKGE 
secondary structure in the target itself provides a means to 
identify that a spedflc target sequence is present in a sample. 
Other factors that may contribute to lower signal intensities 
include differences in probe densities and hybridization 60 
subiliu'es. 

These results demonstrate the advantages provided by the 

DNA chips of the invention to genetic analysis. As another 

example heterozygous mutations are currently sequenced ,,.b. „.at,-.o* <.srcr ,»uuaa <o eU>«r dn.g> b viuo 
by an arduous process involving clooiag and repurification 65 

of DNA. The cloning step is required, because the gel The present inveolion provides DNA chips for delecting 
sequencing systems are poor at resolving even a 1:1 mixture the multiple mutations in the HI V RT gene assodated with 
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resistance to different Ihcrapculics. These DNA chips will 
enable pby'sicians to monitor mutations over time and to 
change therapeutics if resistance develops. The DNA chip 
will provide redundant coofirmatiori of conserved HIV RT 
and other gene sequences, and the probes on the chip will tile 
'through, with overlap, in important mutational hot spot 
regions. The chip wHl optionally have probes (hat spaa the 
entire coding region of the RT and optionally the genes for 
other HIV proteins, such as coat proteins. HIV target nucleic 
acid can be isolated from blood samples (peripheral blood 
lymphocytes or PBMC) and amplified by PCR, primers for 
which are shown in the table below. 
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to gain primary structure information of the DNA target. 
This fomsat has important appUcations in sequencing by 
hybridization, DNA diagnostics and in elucidating the ther- 
modynamic parameters affecting nucleic acid recognition. 
. Conventional DNA sequencing technology is a laborious 
. procedure requiring clcctrophoreiic size separation of 
labeled DNA fragnients. An atternatiyc approach, termed 
Sequencing By Hybridization (SBH), has been proposed 
(Lysovel al., l988,DokLAkacl. NaukSSSR 303: 1508-1511; 
Bains et al., 1988, /. Theor Biol. 135: 303-307; and 
Drmanac et al., 1989, Genomics 4: 114-128, incorporated 
herein by reference). This method uses a set of short 



AMPLr^cA^o^f of target 

TARGET 

STZE PRIMER 1 PRIMER 2 



J, 742bp GTAGA A rrcroTrGACTCAGArrGG GArAAG<nTGGG<rrrArcrArrccAr 

(SEQ ID. NO:292) ^EQ ID. NO-.294) 

335bp AAArcCATACAATACrCCAGTAnTGC ACXX:AICCAAAGGAArGGAGGTrCTTTC 

(SEQ ID. NO:29J) (SEQ ID. NO:295) 

323bp GccbanWK02O13 183^1908 bMM22JJ-2192 " ■ • " 
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The HI VRT gene chips of the invention, as we lias the CF, oligonucleotide probes of defined sequence to search for 
mtDNA, and p53 DNA chips of the invention, illustrate the complementary sequences on a longer target strand of DNA. 
. diverse application of the methods and probe aaays of the The hybridization pattern is used to reconstruct the target 
invention. The examples that follow describe methods for DNA sequence. It is envisioned that hybridization analysis 
preparing nucleic acid targets from samples for application „ of large numbers of probes can be used to sequence long 
to the DNA chips of the invention and provide additional . stretches of DNA. In immediate applications of this hybrid- 
details of the methods of the invention.' ization methodology, a small number of probes can be used 

to interrogate local DNA sequence. 
EXAMPLES The strategy of SBH can be illustrated by the following 

I. VLSIPS™ Technology example. A 12-mer target DNA sequence. 

As noted above, the VLSIPS™ technology is described in 35 AGCCTAGCTGAA, (SEQ. ID NO:296) is mixed with a 
a number of patent publications and is preferred for making .complete set of octa nucleotide probes. If only perfect 
the oligonucleotide arrays of the invention. For complementarity is considered, five of the 65,536 octamer 
completeness, a bricfdcscriptionofhow this technology can probes -TCGGATCG, CGGATCGA, GGATCGAC, 
be used to make and screen DNA chips is provided in this GATCGACT, and ATCGACTT will hybridize to the target. 
Example and the accompanying Figures. In the VLSIPS 40 Alignment of the overlapping sequences firom the hybridiz- 
method, light is shone through a mask to activate functional iQg probes reconstructs the complement of the original 
(for oligonucleotides, typically an — OH) groups protected 12-mcr target: 
with a photoremovable protecting group on a surface of a 

solid support After light activation, a nucleoside building tcgg/TOG 

block, itself protected with a photoremovable protecting 45 cXiGATCGA 

group (at the 5'— OH), is coupled to the activated areas of ggatcgac 

the support The process can be repeated, using different gatcgact 

masks or mask orientations and building blocks, to prepare ^ . JlJ^^^iETT^^ ^ 

J c J 'IT . !• 1 .'J — TCGOATCGACnr (SEQ. ID NO:297) 

very dense arrays of many different oligonucleotide probes. . _ ^ ' 

The process is illustrated in FIG. 28; FIG. 29 illustrates how 50 ^ 
the process can be used to prepare "nucleoside combinalo- Hybridization methodology can be carried out by attaching 
rials" or oligonucleotides synthesized by coupling all four target DNAto a surface. The target is interrogated with a set 
nucleosides to form dimers, trimers, etc. ofoUgonucleotide probes, one at a time (see Strezoska et al.. 

New methods for the combinatorial chemical synthesis of 1991, Proc. Natl. Acad Sci, USA 88: 10089-10093, and 
peptide, polycarbamate, and oligonucleotide arrays have 55 Dnnanac el al., 1993, Science 260: 1649-1652, each of 
recently been reported (sec Fodor ct al., 1991, Science 251: which is incorporated herein by reference). This approach 
767-773; Cho et a!., 1993, Science 261: 1303-1305; and ' can be implemented with well established methods of iramo- 
Southem ct aL, 1992, Genomics 13: 1008-10017, each of bilization and hybridization detection, but involves a large 
which is incorporated herein by reference). These arrays, or number of manipulations. For example, to probe a sequence 
biological chips (sec Fodor ct al., 1993, Nature 364: 60 utilizing a fiill set of octanucleolidcs, tens of thousands, of 
555-556, incorporated herein by reference), harbor specific hybridization reactions must be performed. Alternatively, 
chemical compounds at precise locations in a high -density, SBH can be carried out by attaching probes to a surface in 
information rich format, and arc a powerful tool for the an aaay format where the identity of the probes at each site 
study of biological recognition processes. A particularly is known. The target DNA is then added to the array of 
exciting application of the array technology is in the field of 65 probes. The hybridization pattern determined in a single 
DNA sequence analysis. The hybridization pattern of a DNA experiment directly reveals the identity of all complemen- 
target to an array of shorter oligonucleotide probes is used tary probes. ' 
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As noted above, a preferred method of oligonucleotide of the probes will generate detectable signals. Modifying the 
probe array synthesis involves the use of Ught to direct the above expression for N,*one arrives at a relationship esti- 
syuthesis of oUgonucleotidc probes in high-density, minia- mating the number of detectable hybridizations (Nd) for a 
turized arrays. Photolabile 5'-protectcd N-acyl- DiVA target of length Lt and an array of complexity C 
deoxynucleoside phosphoramidites, surface linker 5 Assuming an average of 5 positions giving signals above 
. chemistry, and versatile combinatorial synthesis strategies background: ■Nd-*(l+5(Crl))[Ll-(Lp-l)]. 
have been developed for this technology. Matrices of ' Arraysbfoh'gbnuclebtides can be efficiently generated by 
spatially-defined oligonucleotide .probes, have been light-directed synthesis and can be used to determine the 
generated, and the ability to use these arrays to identify identity of DNA target sequences. Because combinatorial 
complementary sequences has been demonstrated by lO strategies arc used, the number of compounds increases 
hybridizing fluorescent labeled ob'gonucleotides to the DNA exponentially while the number of chemical coupling cycles 
chips produced by the methods. The hybridization pattern - increases pnly linearly. For example, expanding the synthc- 
demonstrates a high degree of base specificity and reveals sis to the complete set of 4' (65,536) octanucleo tides will 
the sequence of oligonucleotide targets. add only four hours to the synthesis for the 16 additional 

The basic strategy for light-directed oligonucleotide syn- J5 cycles. Furthermore, combinatorial synthesis strategies can 
thesis (1) is outlined in FIG. 2S, The surface of a solid be implemented to generate aaays of any desired composi- 
support modified with photolabile protecting groups (X) is lion. For example, because the entire set of dodccamers (4") * 
illuminated through a photolithographic mask, yielding can be produced in 48 photolysis and coupling cycles (b" 
reactive hydroxyl groups in the illuminated regions. A compounds requires bxn cycles), any subset of the dodecam- 
3*-0-phosphoramidite activated deoxynucleoside (protected 20 ers (including any subset of shorter oligonucleotides) can be 
at the 5'-hydroxyl with a photolabile group) is then presented constructed with the correct lithographic mask design in 43 
to the surface and coupling occurs at sites that were exposed ' or fewer chemical coupling steps. In addition, the number of^ 
to Ught. Following capping, and oxidation, the substrate is compounds in an array is Umited only by the density of 
rinsed and the surface illuminated through a second mask, to synthesis sites and the overall array size. Recent cxperi- 
expose additional hydroxyl groups for coupling. A second 2i ments have demonstrated hybridization to probes synthe- 
5'-protecled, 3'-0-phosphoramiditc activated deoxynucleo- sized in 25 /mi sites. At this resolution, the entire set of 
side is presented to the surface. The selective photodepro- 65,536 octanucleotides can be placed in an array measuring 
tcction and coupling cycles arc repeated until the desired set 0.64 cm square, and the set of 1,048,576 dodecanuclcotides 
of products is obtained. requires only a 2.56 cm array. 

Light .directed chemical synthesis lends itself to highly 30 Genome sequencing projects will ultimately be limited by 
efficient synthesis strategics which will generate a maximum • DNA sequencing technologies. Current sequencing method- ■ 
number of compounds in a minimum number pi chemical olpgies.are highly reliant on complex procedures and require 
steps. For example, the complete set of 4n polynucleotides substantial manual ieffort. Sequencing by hybridization has 
(length n), or any subset of this set can be produced in only the potential for transforming many of the manual cflEbrts 
4xn chemical steps. See FIG. 29. The patterns of ilium ina- 35 into more efficient and automated formats. Light-directed 
lion and the order of chemical reactants ultimately define the synthesis is an efficient means for large scale production of 
products and their locations. Because pbotoUthography is miniaturized arrays for SBH. The oUgonucleotide arrays are 
used, the process can be miniaturized to generate high- not Umited to primary sequencing applications. Because 
density arrays of oligonucleotide probes. For an example of single base changes cause multiple changes in the hybrid- 
the nomenclature useful for describing such arrays, an array 40 ization pattern, the oligonucleotide arrays provide a power- 
containing all possible octanucleotides of dA and dT is ful means to check the accuracy of previously elucidated 
written as (A^Tf, Expansion of this polynomial reveals the DNA sequence, or to scan for changes within a sequence. In 
identity of all 256 octanucleotide probes from AAAAAAAA the case of octanucleotides, a single base change in the target 
to TTTTTTTT. A DNA array composed of complete sets of DNA results in the loss of eight cx)mplemenls, and generates 
dinucleotides is referred to as having a complexity of 2. The 45 eight new complements. Matching of hybridization patterns 
array given by (A+T+C+G)8 is the full 65^36 octanucle- may be useful in resolving sequencing ambiguities from 
otide array of complexity four. standard gel techniques, or for rapidly detecting DNA muta- 

To cany out hybridization of DNA targets to the probe lional events. The potentially very high information content 
arrays, the arrays are mounted in a thermostatically con- of light-directed oligonucleotide arrays will change genetic 
trolled hybridization chamber! Fluorescein labeled DNA 50 diagnostic testing. Sequence comparisons of hundreds to 
targets are injected into the chamber and hybridization is thousands of different genes will be assayed simultaneously 
allowed to proceed for W to 2 hours. The surface of the instead of the current one, or few at a time fonnat. Custom 
matrix is scanned in an cpifluorescence microscope (Zeiss arrays can also be constructed to contain genetic markers for 
Axioscop 20) equipped with photon counting electronics the rapid identification of a wide variety of pathogenic 
using 50-100 /<W of 488 nm excitation from an Argon ion 55 otganisms. 

laser (Spectra Physics model 2020). All measurements are Oligonucleotide arrays can also be applied to study the 
acquired with the target solution in contact with the probe sequence specificity of RNA or protein-DNA interactions, 
matrix. Photon counts arc stored and image files are pre- Experiments can be designed to elucidate spedficity rules of 
sented after conversion to an eight bit image format. See non Watson-Crick oligonucleotide structures or to investi- 
FIG. 33. 60 gate the use of novel synthetic nucleoside analogs for 

When hybridizing a DNA target to an oligonucleotide aatiseose or triple helix applications. Suitably protected 
array, N-Lt-(Lp-l) complementary hybrids are expected, RNA monomers may be employed for RNA synthesis. The 
where N is the number of hybrids, Lt is the length of the oligonucleotide arrays should find broad appL'catioo deduc- 
DNA target, and Lp is the length of the oUgonucleotide ing the thermodynamic and kinetic rules governing forma- 
probcs on the array. For example, for an 11-mer hybridized 65 tion and stability of oligonucleotide complexes, 
to an octanucleotide array, N-4. Hybridizations with mis- Other than the use of pho tore movable protecting groups, 
matches at positions that arc 2 to 3 residues from either end the nucleoside coupling chemistry is very similar to that 



5,837,832 



29 



30 



used roulincly today for oligonucleotide synthesis. FIG. 30 
shows the dcprotcction, coupling, and oxidattoa steps of a 
sob'd phase DNA synthesis method. FIG. 31 shows an 
illustrative synthesis route for the nucleoside building blocks 
used in the method. FIG. 32 sho\ys a preferred photoremov- 
ablc protecting group, Mc^fPOC, and how to prepare the 
group in active fortn. The procedures described below show 
how to prepare these reagents. The* nucleoside building 
blocks are 5'.McNPOC-THYMIDINE-3*-OCEP; 
5'-McNP0C-N^-t-BUTYL PHENOXYACETYL- 
DE0XYCYnDINE-3'-0CEP; S'-MeNPOC-N^-l-BLTIYL 
PHEN0XYACETYL.DE0XYGUAN0SINE-3*-0CEP; 
and 5'-McNP0C-N*-t-BUTYL PHEISTOXYACETYL- 
DEOXYADE^fOSINE-3*•OCEP. 

A. Preparation of 4, 5-nieihylenedioxy-2-nitroacctophcnonc 



minimum volume of CH^Cla or THF(-175 ml) and then 
predpitaliDg it by slowly'adding hcxane (1000 ml) while 
stirring (yield 51 g; 80% overall). It can also be recrystal- 
lizcd (eg., tolucne-hexane), but this reduces the yield. 
C. Preparation of l-(4^- . mcthylcncdioxy-2-nitrophcnyl) . 
ethyl chloroforniatc (McNPOC-CI) ... 
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A solution of 50 g (0.305 mole) 3,4- 
mcthylenedioxyacclophcnonc (Aldrich) in 200 mL glacial 
acetic acid was added dropwisc over 30 minutes to 700. mL 
of cold (2-4** C) 70% HNO3 with stirring (NOTE: the 
reaction will overheat without external cooling from an ice 



25 Phosgene (500 mLof 20% W/V in toluene from Fluka: 965 
mmole; 4 cq.) was added slowly to a cold, stirring solution 
of 50 g (237 mmole; 1 cq.) of l-(4,5-methylenedioxy-2- 
nitrophcnyl)ethanol in 400 mL dry THF. The solution was 
stirred overnight at ambient temperature at which point TLC 



bath, which can be dangerous and lead to side products). At 30 (20% Et^O/hcxane) indicated >95% conversion. The mix- 



temperatures below 0' C, however, the, reaction can be 
sluggish. A temperature of 3—5". C. seems to be optimal). 
The mixture was left stirring for another 60 minutes at 3'*-5*' 
C, and then allowed to approach ambient temperature. 
Analysis by TLC (25% EtOAc in hcxane) indicated com- 
plete conversion of the starting material within 1-2 hr. When 
the reaction was complete, the mixture was poured into -3 
Ulers of crushed ice, and the resulting yellow solid was 
filtered off, washed with water and then suction-dried. Yield 
"53 g (84%), used without further purification. 
B. Preparation of l-(4,5-MethyIcncdioxy-2-nitrophenyl) 
ethanol 




ture was evaporated (an oil-less pump with downstream 
aqueous NaOH trap is recommended to remove the excess 
phosgene) to afford a viscous brown oil. Purification was 
effected by flash chromatography on a short (9x13 cm) 
35 column of silica gel clutcd with 20% EtjO/hexanc. Typically 
55 g (85%) of the soUd yellow MeNPOC-Cl is obtained by 
this procedure. The crude material has also been recrystal- 
lizcd in 2-3 crops from 1:1 cther/hexanc. On this scale, -100 
ml is used for the first crop, with a few percent THF added 
40 to aid dissolution, and then cooling overnight at -20** C. (this 
procedure has not been optimized). The product should be 
stored dessicated at -20* C. 

D. Synthesis of 5*-MeNPOC-2'-DEOXYNUCLEOSIDE-3*- 
(NJ^-DIISOPROPYL 2-CYANOETHYL PHOSPHORA- 
45 MIDITES 

(1) 5'-McNPOC-Nuclcosidcs 



Base Meppocg ^ 
" Pyridine ^ 



Sodium borohydride (10 g; 0.27 mol) was added slowly 
to a cold, stirring suspension of 53 g (0.25 mol) of 4,5- 
mcthylenedioxy-2-nitroacctophenone in 400 mL methanol. 
The tcmperamrc was kept below 10" C. by slow addition of 55 
the NaBH4 and external cooling with an ice bath. Stirring 
was continued at ambient temperature for another two hours, 
at which lime TLC (CHjCL) indicated complete conversion 
of the ketone. The mixture was poured into one liter of 
ice-water and the resulting suspension was neutralized with 60 
ammonium chloride and then extracted three times with 400 
mL CHaQj or ElOAc (the product can be collected by 
filtration and washed at this point, but it is somewhat soluble 
in water and this results in a yield of only -60%). The 
combined organic extracts were washed with brine, then 65 
dried with MgS04 and evaporated. The crude product was 
purified from the main byproduct by dissolving it in a 




McnpocO' 




Base-THYMIDINE (T); N-4-IS0BUTYRYL 
2'-DE0XYCYnDINE(ibu^Q; N.2-PHEN0XYACETYL 
2'DEOXYGUANOSirfE (PAC-dG); and N-6. 
PHENOXYACETYL 2'DEOXYADENOSINE (PAC-dA) 

All four of the 5'-MeNPGC nucleosides were prepared 
from the base -protected 2'-deoxynucleosides by the follow-, 
ing procedure. The protected 2*-deoxynucleoside (90 
mmole) was dried by co-evaporating twice with 250 mL 
anhydrous pyridine. The nucleoside was then dissolved in 



5,837,832 

31 32 

300 mL anhydrous pyridine (or 1:1 pyridinc/DMF, for the For products in the 200 to 1000 bp size range, check 2 n\ 

dG nucleoside) under argon and cooled to -2* C in an of the reaction on a 1.5% O^SxTBE agarose gel using an 

ice bath. A solution of 24.6 g (90 m[nolc)MeNPOC-CI in appropriate size standard (pbiXi74 cut with HacIII is 

100 mL dry THP was then added with stirring over 30 coavcnicat). The PCR reaction should yield several pico- 

; minutes. The ice bath was removed, and the solution allowed $ moles of product. It is hclpfiil to include a negative control 

to stir overnight at room temperature (TLC: 5-10% MeOH ; - (i.c., IfdTE instead of genomic DNA) to check for possible 

in CHjClj; two diastereomers).: After evaporating the sol- contamination. To avoid cbntaminaiion,'kecp PCR products 

vents under vacuum, the crude material was taken up in 250 from previous experiments away from later reactions, using 

mL ethyl acetate and extracted with saturated aqueous filter tips as appropriate. Using a set of working solutions 

NaHCOj and brine. The organic phase was then dried over jq aod storing master solutions separately is helpful, so long as 

Na^SO^, filtered and evaporated to obtain a yellow foam. docs not contaminate the master stock solutions. 

The crude products were finally purified by flash chroma- simple amplifications of short fragments from 

tography (9x30 cm silica gel column cluted with a stepped genomic DNA it is, in general, unnecessary to optimize 

gradient of 2%~6% McOH in CH^Cy . Yields of the puri- * concentrations. A good procedure is the following: 

ficd diastcreomeric mixtures are in the range of 65-75%. ,< * master mix minus enzyme; dispense the genomic 

(2) 5'-MeNPOC-2*.DEOXYNUCLEOSIDE-3'-(N,N- samples to individual tubes or reaction wells; add 

DIISOPROPYL 2 CYANOETHYL enzyme to the master mix; and mix and dispense the master 

PHOSPHORAMTnITP«^^ ' solution to each well, using a new filler tip each time. 

FHOiPHORAMlDITES) 2) PURIFICATION 

...... ^ Removal of unincorporated nucleotides and primers from 

McopocO^^^ VjX^"* — DtEA/DCM ^ ^0 PCR Samples can be accompbshed usmg the Promcga 

\ / Magic PCR Preps DNA purification kit. One can purify the 

V— / whole sample, following the instructions supplied with th^ 

HO J^t (proceed firom section IIIB, 'Sample preparation for 

direct purification from PCR reaaions'). After elulion of the 
Men 00""^^^-^ ^ -^^^ product in 50 ^1 of TE or H^O, one centrifuges the 

cnpo cluate for 20 sec at 12,000 rpm in a microfuge and carefully 

transfers 45/il to a new microfiigc tube, avoiding any visible 
pellet. Resin is sometimes carried over during the clution 
step. This transfer prevents acddental contamination of the 
linear amplification reaction with 'Magic PCR' resin. Other 
methods, e.g. size exclusion chromatography, may alsp.be 
used.- • 
3) UNEAR AMPUndAnON 
In a 0.2 mL thin-wall PCR tube mix: 4 //I purified PCR 
The four deoxynucleosides were phosphitylatcd using 35 product; 2/zlprimer(10pmol///0; 4//1 lOxPCR buflfcr, 4//I 
cither 2-cyanocthyl-N,N-diisopropyl dNTPs(2mMdA,dC.dG,0.1mM<ri);4^10.1 mMdUTP; 

chlorophosphoramidite, or 2-cyanoethyl-N,N,N\N'- 1 u\ 1 mM fluorescein dUTP (Amersham RPN 2121); 1 U 
lelraisopropylphosphorodiamidite. The following is a typi- Taq polymerase (Pcrkin Elmer, 5 U//il); and add HjO to 40 
cal procedure. Add 16.6 g (17.4 ml; 55 mmole) of /il. Conduct 40 cycles (92* C. 30 sec, 55** C. 30 sec, 72' C. 
2-cyanocthyl-N,NJ^',N'-letraisopropyIphosphorodiamiditc 40 90 sec) of PCR. These conditions have been used to amplify 
to a solution of 50 mmole 5*-McNPOC-nuclcosidc and 4.3 a 300 nucleotide mitochondrial DNA fragment but arc 
g (25 mmole) diisopropylammonium letrazolide in 250 mL generally apph'cable. Even in the absence of a visible 
dry CHjCU under argon at ambient temperature. Continue product band on an agarose gel, there should still be enough 
stirring for 4-16 hours (reaction monitored by TLC: product to give an easily detectable hybridization signal If 
45:45:10 hcxane/CH^aa/EiaN). Wash the organic phase 45 one is not treating the DNA with uracil DNA glycosylasc . 
with saturated aqueous NaHCO^ and brine, then dry over (see SccU'on 4), dUTP can be otm'tted firom the reartion. 
Na^SO^. and evaporate to dryness. Purify the crude amidite 4) FRAGMENTATION 

by flash chromatography (9x25 cm silica gel column cluted Purify the linear amplification product using the Promcga 
with hexane/CH^CL/TEA ^5:45:10 for A, C, T; or 0:90:10 Magic PCR Preps DNA purification kit. as per Sicfion 2 
for G). The yield of purified amidite is about 90%. 50 above. In a 0.2 mL thin-wall PCR tube mbc: 40 /d purified 

11. PREPARATION OF LABELED DNA/ Ubcled DNA; 4 ^1 lOxPCR buffer; and 05 //I uracil DNA 
HYBRIDIZAnON TO ARRAY glycosylasc (BRL lU/;il). Incubate the mixture 15 min at 

1) PCR 37* C, then 10 min at 97* C; store at -20* C. until ready 

PCR amplification reactions arc typically conducted in a to use. 
mixture composed of per reaction: 1 /<l genomic DNA; 10 /d 55 5) HYBRIDIZAnON SCANNING & STRIPPING 
each primer (10 pmoLJul stocks); 10 lOxPCR buffer (100 A blank scan of the slide in hybridization buffer only is 
mM Tris.Cl pH85, 500 mM KCl, 15 mM MgQJ; 10 /A 2 helpful to check that the slide is ready for use. The buffer is 
mM dNTPs (made from 100 mM dNTP stocks); 2.5 U Taq removed from the flow cell and replaced with 1 mL of 
polymerase (Ptrkin Elmer AmpUTaq™, 5 UZ/d); and H^O to (fragmented) DNA in hybridization buffer and mixed well. 
100 //I. The cycling condiu'ons are usually 40 cycles (94** C. 60 The scan is performed in the presence of the labeled target. 
45 sec, 55** C. 30 sec, 72* C. 60 sec) but may need to be FIG. 33 illustrates an illustrative detection system for scan- 
varied considerably from sample type lo sample type. These ning a DNA chip. A series of scans at 30 min intervals using 
conditions are for 0.2 mL thin wall tubes in a Pcrkin Elmer a hybridization temperature of 25* C. yields a very clear 
9600 thermocycler. See Pcrkin Elmer 1992/93 catalogue for signal, usually in at least 30 min to two hours, but it may be 
9600 cycle U'mc information. Target, primer length and 65 desirable to hybridize longer, i.c., overnight. Using a laser 
sequence composition, among other factors, may also affect power of 50 and 50 /ot pixels, one should obuin 
parameters. maximum counts in the range of hundreds to low thousands/ 
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pixel for a new sUdc. When finished, the sUde can be 30 sec) arc performed, but cycling coDditioos may need to 
stnppcd xismg 50% formamide. rinsing weU in deionizcd be varied. These conditioas arc for 0.2 mL thin wall tubes in 
H,0, blowing dry. and storing at zoom temperature. Pcrkin Elmer 9600. For products in the 200 to 1000 bp size 

m^E/lSSl^n^^^^ LABELED RNA range.check2/dof the reaction on a 1.5% 0.5xTBE agarose 

??TArJ?^b^^^ ' gel using an appropriate size standard For larger or smaller 

1) TAGGED PRINIERS .,,: ../ volumes (20-100 one can use the same amount of 

. . Tlie prmiers used to amplify the Urget nucleic acid should genomic DNA but adjust the other ingredients accordingly 
have promoter sequences if one desires to produce RiVA 4) IN VITRO TRANSCRIPTION 
from the amplified nucleic acid. Suitable promoter ML\:3/xl PCRproduct;4/a5xbuffcr; 2/ilDTr- 2 4wl 10 
sequences are shown below and include: lo mM rNTPs (100 mM soluUons from Pharmada); 6 48 u\ 10 

lIv^^^PxSf^f'fL'i^fS"""- ™^ fluorcscein-UTP (Fluorcscein-12.UTP, 10 mM 

5 -CGGAATTAACCCTCACTAAAGG (SEQ. ID NO:298) solution, firom Bochringer Mannheim); 0.5 id RNA oolv- 
^^f'f^^^^^^^^^OGGAG; (SEQ. ID NO:299) merasc (Promega T3 or T7 RNA polymerase); and add H,0 
(2) the T7 promoter sequence: to 20;xl. Incubate at 37** C for 3 h. Check 2 id of the reaction 

J^^^^^ACTCACTATAGGGAG; (SEQ. ID NO:300) 15 on a 1S% OSxTBE agarose gel using a size standard 
sequence: 5xbuffer is 200 mM Tris pH 7S, 30 mM MgCL. 10 mM 

5 AITTAGGTGACACTArAGAA. (SEQ. ID NO:301) spermidine. 50 mM NaO, and 100 mM DTT (supplied with 

Ttc desired promoter sequence is added to the 5' end of the enzyme). The PCR product needs no purification and can be 
PGR primer. It is convenient to add a different promoter to added directly to the transcripU'on mixture. A 20/^1 reaction 
each pnmcr of a PCR primer pair so that either strand may 20 is suggested for an initial test experiment and hybridization- 
be transcribed from a single PCR product. a 100 reaction is considered "preparative" scale (the 
Synthesize PCR primers so as to leave the DMTgroup on. reaction can be scaled up lo obtain more target). The amount' 
DMT-on punfication is unnecessary for PCR but appears to of PCR product to add is variable; typically a PCR reaction 
be mjportant for transcription. Add 25 /il 0.5M NaOH to wiU yield several picomoles of DNA If the PCR reaction 
"Jfx f°-? ^^^^^^^ of oligonucleotide to keep 25 docs not produce that much target, then one should increase 
rhc^ DMT group on. Dcproteci using standard chemistry— the amount of DNA added to the transcription reaction (as 
urn ^^*''°!f ^ convenient. weU as optimize the PCR). The ratio of fluoresccin-UTP to 
HPLC punfication is accomplished by drying down the UTP suggested above is 1:5, but ratios from 1:3 to 1:10— all 
oligonucleotides, resuspending in 1 mLO.lM TEAA (dilute work weU. One can also label with biotin-UTP and detect 
2.0M stock m dcionized water, filter through 0.2 micron 30 with strep tavidin-FITC to obtain sitm'Iar results as with 
filter) and filter through 0.2 micron .filter. Load 03 . mL on fluoresccin-UTP detection. 

reverse phase. HPLG (columii can' be a Hamilton PRP-1 For nondenaturing agarose gel cicclrophorcsis of RNA, • 
scmi-prep. #79426). The gradient is 0-^50% CH3CN over note that the RNA band wiU normaUy migrate somewhat 
25 mm (program 0.2 ^ol.prcp.0-50, 25 min). Pool the faster than the DNA template band, although sometimes the 
desired fractions, dry down, rcsuspend in 200 //1 80% HAc. 35 two bands wiU comigratc. The temperature of the gel can 
30 mm RT. Add 200 /d EtOH; dry down. Resuspend in 200 effect the migration of the RNA band. The RNA produced 
//I H.O, plus 20 /il NaAc pH55, 600 /d EtOH. Leave 10 min from in vitro transcription is quite suble and can be stored 
on ice; centrifuge 12.000 rpm for 10 min in microfiige. Pour for months (at least) at -20* C. without any evidence of 
off supernatant. Rinse peUet with 1 mL EtOH, dry, rcsuspend degradation. It can be stored in unsicrilized 6xSS?E 0 1% 
in 200//1 H20. Dry, rcsuspend in 200/^1 TE. Measure A260, 40 iriton X- 100 at -20* C. for days (at least) aad reused twice 
prepare a 10 pmoU/il soluUon in TE (10 mM Tris.Q pH 8.0, (at least) for hybridization, without taking any special pre- 
0.1 mM EDTA). Following HPLC purificaUon of a 42 mer, cautions in preparation or during use. RNase contamination 
a yield in the vicinity of 15 nmol from a 0.2 fanol scale should of course be avoided. When extracting RNA from 
synthesis is typical. cells, it is preferable to work very rapidly and tq use strongly 

2) GENOMIC DNA PREPARATION 45 denaturing conditions. Avoid using glassware previously 

For obtaining genomic DNA from human hair, one can contaminated with RNascs. Use of new disposable plas- 
cxtract as few as 5 hairs, including hair roots. On a clean and ticware (not necessarily sterilized) is preferred, as new 
sterile surface, one places the hair on a piece of parafilm, and plastic tubes, dps. etc., are essentially RNase free. Treatment 
after wiping a new razor blade with EtOH cutting off the . with DEPC or autoclaving is typically not unnecessary. • 
roots, the roots arc transferred to a IJ mL microfiige rube 50 . 5) FRAiGMENTATION 

using a pair of MiUipore forceps cleaned with EtOH. Add In a 02 mL thin-wall PCR tube mix: 18 /d RNA (direct 
500/a (10 mM Tris.a pH8.0, 10 mM EDTA, 100 mM NaQ, from transcription reaction— no purificaU'on rcqm'rcd); 18 /il 
2% (w/v) SDS, 40 mM DTT, filter sterilized) to the sample. HjO; and 4 ^1 IM Tris.a pH9,0. Incubate at 99.9' C for 60 
Add 1.25 /A 20 mg/ml proteinase K (Bochringer) Incubate at min. Add to 1 mL hybridization buffer and store at -20" C. 
55* C for 2 hours, vortcxing once or twice. Perform 2x0.5 55 until ready to use. The alkaline hydrolysis step is very 
mL 1:1 phcnoI:CHa3 cxtracU'ons. After each extraction, rch'ablc. The hydrolysed target can be stored at -20* C. in 
centrifuge 12,000 rpm 5 min in a microfiige and recover 0.4 6xSSPE^0.1% Triton X-100 for at least several days prior to 
mL supernatant Add 35 /d NaAc pH5.2 plus 1 mL EtOH. use and can also be reused. 

Place sample on ice 45 min; then ccntrifiige 12,000 ipm 30 6) HYBRIDIZAnON SCANNING, & STRIPPING 
min, rinse, air dry 30 min, and resuspend in 100 //I TE. 60 A blank scan of the slide in hybridization buffer only is 
^) helpful to check that the slide is ready for use. The buffer is 

PCR is performed in a mixture containing, per reaction: 1 removed from the flow cell and replaced with 1 mL of 
/il genomic DNA; 4/d each primer (10 pmol//d stocks); 4//1 (hydrolysed) RNA in hybridization buffer and mixed well 
10 xPCR buffer (100 tnM Tris.a pH8.5, 500 mM KQ, 15 Incubate for 15-30 min at 18* C. Remove the bybridizaUon 
mM MgClj); 4;d 2 mM dNTPs (made from 100 mM dNTP 65 solution, which can be saved for subsequent experiments, 
stocks); 1 U Taq polymerase (Perkin Elmer, 5 U//d); HjO to Rinse the flow cell 4-5 times with fresh changes of 6xSSPE/ 
40 //I. About 40 cycles (94" C. 30 sec, 55* C. 30 sec, 72* C. 0.1% Triton X-100, cquflibrated to 18* C. The rinses can be 
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performed rapidly, but it is imporunt to empty the flow ceU rinsing wcU in dciooized H^O. blowing dry and storine at 

before each new rinse and lo mix the liquid in the cell room temperature. ^ e> 

thoroughly. The scan is performed in the presence of the These conditions arc illuslrauvc and assume a probe 

labeled UigcL A scries of scans at 30min intervals using a length of -15 nucleotides. The stripping conditions sug- 

. hybridization temperature of 25* C. yields a very clear 5 are^fairly scvcrc, but some signal may remain on the 

signal, usuaUy in at least 30 min to two'hours. but it may be - '^^"^ ^"^^^ ^ stringent. Nevertheless, the rounts 

desirable to hybridize- longer; i.e.. overnight. Using a laser • j^f^^^i^S ^^^^ ^« ^^fa. f^^^^ vc ry low in comparison ■ 

power of 50 W and 50%m p^ixels, oL should Ten^ferS^^^ 

maxm^umcountsmtherapgeofhundredstolowihousands/ gybridizalion temperature and the longc^h; duS 

pixel for a new slide. When- finished, the slide can be lo hybridization, the more difficult it is lo strip the sUde Loneer 

stnpped usmg50% to 100% formamide at50* C. for 30 min, . targets may be more difficult to strip than shorter targets. 



SEQUE-VCE VSaSG 



( 1 ) CEMRAL INFOWWlOy: 

(MI ) NUMBER Of SEQUEXCESiJtfO 

( 2 ) INFOR-MAnON FOR SEQ CO .\0:t: 

{ I )SEQUEXCECHARAaERXSnCS: 
( A ) LENGTH: 15 bu< piin 
( B)mE:BadekKld 
( C ) STRA?a)£I)NESS: tla^« 
(0)TOPOLOCY:G«ir 

( I I )M0L£CUl£TY7ErDX\(p«be) 

( « I ) SEQUENCE DESCRIPTION: SEO © NO:l: 

TTCCTCACCT CAGCC 

( 2 ) I>TOR.VMnON FOR SEQ lb NOO: 

( I ) SEQUENCE CHARACIERISnCS: 
( A ) lENCTH: U Use pa!n 
( B) TYPE: asdctc Kief 
( C ) SnUNDED.NESS: wi^e 
(D)TDPOU)CY:UM»r 

( I I )MOLECUl£TYPE:DNA(p«be) 

( X I ) SEQUENCE DESCRimON: SEQ ID NOO: 

TTGCTCACAT CAGCC 



I 5 

I 



( 2 ) INFOR.\(AnON FOR SEQ m NOJ: 

( i ) SEQUENCE CKARAOEHISnCS: 
(A)L£NCTK:Utus«ptn 
( B)TirFE:snddc«cI<f 
( C ) STRANDEDNHSS: tlsfle 
(O)T0roiJ0CY:C«af 

( i I )M6L£CULEmE:D^EA(proU) 

( X I ) SEQUENCE OESOtlPTION: SEQ © NOJ: 

TTOCTGACCT CAGCC 

( 2 ) INFOR-VUnON FOR SEQ Q> N0:«: 

( I ) SEQUENCE CKARACTERmiCS: 
(A)t£NGrH:t5bascp!» 
( B) TYPE: codck Kid 

(C) STRANI7E0N£SS:«iajfe 

(D) TDPOLOCY:Ibeir 

( I I )MOt£CUlf TYPE: DMA OwU) 



( X I ) SEQUCfCE 0E5CaiPTT0.y: SEQ ED N0:4: 



5,837,832 

37 

-continued 

TTGCTOACTT CAGCC 



(2)I^nFORMA^ONFORSEQIbNO:5: . 

( I ) SEQUENCE CKARACTKISnCS:.' ' 
( A ) LENGTH: 39 bas« pain 
( B ) TYPE: aodcK >cid 
( C ) STKANDEDNESS: sla^« 
( D ) TOPOLOGY: tloor 

( I I ) MOLECULE TYPE: D.VA(oIigoaocIeotIde) 

( X I ) SEQUENCE DESOUPTION: SEQ ID NO J: 

CATTAAAOAA AATATCATCT TTOGTCTTTC CTATCATCA 



( 2 ) INFORMAnON FOR SEQ ID HO.S: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENOTK' 36 bu« pain 
( B )TYPE: encIelcKid 
( C ) SnUNDEONESS: tn^c 
(D)TOPOLOCY:liaaj 

( I I ) MOLECULE TYPE: DNA(prob«) 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:tf: 

CATTAAAGAA AATATCATTG GTGTTTCCTA TGATGA 



( 2 ) INFORMAnON FOR SEQ ID N0:7: 

.( I )SEQUENCTCHAJlACTEil£SnCS: 
( A ) LENCTK: 36 bu< pj!n 
{ B ) TYPE: Dttdctc kIcJ 
( C ) shlANDEDNESS: tln^e 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA(g«ook) 

( X I ) SEQUENCE DESCRIPTTON: SEQ ID N0:7: 

CATTAAAOAA AATATCATTG GTGTTTCCTA TCATOA 



( 2 ) (NFORMXnON FOR SEQ ID NO-A 

( I J SEQUENCE CHARACIERISTICS: 
(A)t£NGTH:l5ba*epaIn 
( B )TYPE*BadcIcKld 
( C ) SntANDEONESS: skgle 
(D)T0P0LOGY:lh«r 

( I I ) MOLECULE TYPE: DNAOirob*) 

( X I ) SEQUENCE DESCRIPTtpN: SEQ tD NOA 

AACACCAATG ATCAT 



( 2 ) INFOR^VOmON FOR SEQ ID KO:9: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 ba«e pin 
( B )TYP£:sodcIc»cU 
( C ) STRANDEOfTESS: sla^e 
( D)TOP0LOGY:tlDear 

( t 1 ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIFnON: SEQ ID NO:9: 

CCAAAGATNA TATTT 



( 2 ) (NFORMAnON FOR SEQ ID KOrlO: 
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^ . -continued 

( I ) SEQUENCE CKAaACIEStSnCS: 

( A ) LENGTH: 15 buc ptin 

( B ) TY?E: aodeie add 

( C ) SniANDEDNESS: th^c 
' ( D )TOroLOCY:War " 

C i nMdlECUlETYPEiDXAOEibe) 

( X I ) SEQUENCE DESCRIPTION: SEQ CD NO:lOb 

ACCAAACANC ATATT 



( 2 ) INFORMmON FOR SEQ CD NO:U: 

( I ) SEQUENCE CHARACTEarsnCS: 
( A ) LENGTH: li bue fuln 
( B )TY7E:oocte{c kU 
( C ) STHANDEO.SESS: «io jle 
<D)TOPOU3CY:rouar 

( I I ) WOt£CUl£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRUTTON: SEQ © NOtll: 

CACCAAACNT CATAT 



( 2 ) DTORMAnaS FOR SEQ ID Nai2: 

< I ) SEQUENCE CHARACTERISnCS; 

( A ) LENGTH: 15 biK pain 
( 6 ) TYPE: BQcIcie add 
( C } STIIAM3E0NESS: stn^e 
( D)TOPOLOGY:Iiaear 

( i i )MOl£CULETY7E-D.VA(prob«) 

( X I ) SEQUENCE DESCROTiaV: SEQ ID N0:l2: 

ACACCAAAXA TCATA 



( 2 ) {NFOR\UnaV FOR SEQ 0) NO:U: 

< I ) SEQUENCE CKARACIERtSnCS: 
( A ) LENGTH: 15 base pain 
( B}TYPE:iiaddeadd 
( C ) STRANDEDNESS: smgte 
(D)TOP0L0G)£liaeaf 

( I I ) MOLECULE TYPE: DNA (prob.) 

( X I } SEQUENCE DESCSIFTiaV: SEQ © N0:I3: 

AACACCAANC ATGAT 



( 2 ) INFORMAnON FOR SEQ © MfcUi 

. ( I isEQUENCECKARACTERISnCS: " 
( A ) LENGTH: U baie pain 
( B) TYPE: udde Kid 
( C ) STRANDEDNESS: shgte 
(D)TOP0L0C%Itaea/ 

( I t )UOLB:ULETY?E:DXA0xob«) 

( X I ) SEQUENCE DESCRimaV: SEQ © KOiU- 

AAACACCANA CATCA 



( 3 ) [NFORMAnaV FOR SEQ © N0:I3: 

( I ) SEQUENCE CHARACIERISnCS: 

(A) LENGTK:t5buepatn 

(B) TYPE:Biidc£eKid 

( C ) STRANDEDNESS: ilsjle 
(O)T0P0LOG1&li3ear 



41 
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( I I )MOL£a;LETYPE:D.VA(pobe) 
• ( 1 I ) SEQUENCE DESCRIPTIOiV:'SEQ tD N0:l5: 
OAAACACCNA ACATC . . / 

( 2 ) DfFORMAnOW FOR SEQ ID NO:lS: 

( I ) SEQUENCE CHAIUCTEROTCS: 
( A ) LENGTH- 15 bwe pain 
( B )TYPE aDclcIc kU 
C C ) SnWNDEDNESS: tb^Ie 
( D)TOP0LOOY:ttiar 

( I I ) MOl£CUl£ TYPE: DXA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ tD N0:16: 

OGAAACACNA AACAT 

( 2 ) INFORMAnON FOR SEQ tD HO-.V: 

( I )SEQUENCICHARACTERffnCS: 
( A ) LENGTH: L5 baic pain 
( B)TYP£:aad«le>eM 
( C ) SntANDEDNESS: smgtc 
( D)T0P0LOCY:lisear 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ tD N0:17: 

AGGAAACANC AAAGA 



15 



I 5 



< 2 ) [NFOR.VUnO.V FOR SEQ 0) NO: 18: 

( I ) SEQUENCE CKARACTERISTTCS: 
( A ) LENGTH: 21 b«e piln 
( B ) TYPE: oiicIeCe Kid 
( C ) STOANDEDNESS: •io^Ie 
(D)T0POLOGY:Unar 

( I I ) MOLECULE TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRIPTION: SEQ ID NO:lg: 

CCTTCACAOC CTAAAATTAA G 



2 1 



( 2 ) (NFORMAnON FOR SEQ ID N0:19: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 21 b«s« pain 
( B )TYFE:fiodeic Kid 
( C ) SntA>T)EDNESS: ib«!e 
( D)T0P0LOGY:linev 

(II ) MOLECULE TYPE: DNA ((»obe) 

( « I ) SEQUENCE DESCRIPTION: SEQ ID N0:19: 

CCTTCAGAOT CTAAAATTAA G 2 1 



( 2 ) (NFORMAJION FOR SEQ ID NO-^ 

( I ) SEQUENCE CKARACTERIsnCS: 
( A } lENGTH: 44 bu« pain 
( B) TYPE: oodck Kid 
( C ) STRANDEDNESS: sbgle 
(D)TOPOLOCY:Ikor 

( I I } MOLECULE TYPE: DNA (prebt) 



( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-JO: 
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-continued 



. TAATACCACT CACTATACCG ACATCACCTA ATAATGATCC GTTT 

( 2 ) (NRJRMAnoy FOEt SEQ CD NOJl: - 

( I )SEQUENCEaiASUCTEltynCS: 
• ( A } USiOTH: 4) biK pj!^ 
( B )TYF£:BaelacM:uI 
( C ) SntANDEaVESS: sb^e 
( D )T0P0LOCY: Una/ 

( I 1 )MOtJEa;i£ TYPE: DXA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID N0:21: 

TAATACCACT CACTATACCG ACTACTCTCA ACGGTTCATA TOC 

( 2 ) INFORMAnON FOR SEQ © NO-^ 

< I ) SEQUENCE CHARACTERtSnCS: 
( A)LEKCTH:45 tntcjaln 
( B ) TYPE* Bodfic Kid 
( C ) STKANDEDNESS: ib^e 
(D)T0POLOCY:naear 

( I I )MOl£CULE TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCEmiON': SEQ ID N0:2i 

CTCGCAATTA ACCCTCACTA AAGGTAGTGT CAAGGOTTCA TATGC 



( 2 ) (NFORMXnCN FOR SEQ fD N0:2J: 

( i ) SEQUENCE CKARAOEaiSnCS: 
( A ) LENGTH: 43 basi pain 
.( B )TYPE:oQdck«eId 
(C)STRANDEDNESS:tb^e . 
( 0)TOroLOCY:Iiiiar 

( I I ) MOLECUl£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ tD NO-JJ: 

TAATACCACT CACTATACCG AGACCATACT AAAAGTGACT CTC 



( 2 ) INTORMAnON FOR SEQ ID KO:24: 

( i ) SEQUENCE CKARACTERISnCS: 
( A)t£NCIH:44b«cpaIf» 
( B )TYPE:Bodck»eId 
< C ) STIUNDEDNESS: tbgic 
( D)TOFOLOGY^llutf 

( I I )MOLEa;U TYPE: ONA (probe) 

( X I ) SEqUENCE DESCRIPTION: SEQ ID NO'^4: 

TAATACGACT CACTATACCC AGACATCAAT CACATTTACA CCAA 



( 2 ) INFORMAnON FOR SEQ ID N0:2S: 

( 1 ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: 44 base pain 
( B )TYPE:Bodele»ci4 
( C ) SnUNDEDNESS: ihglc 
( D)T0POLOCl£thev 

( i I ) MOLECULE TYPE: DXA(piobc) 

( X 1 ) SEQUENCE DESCRIPTION: SEQ D NO-^ 

CCCAATTAAC CCTCACTAAA CGACATCAAT CACATTTACA CCAA 



( 3 ) INTOR.VUnON FOR SEQ ID K0:3S: 
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-continued 

( I )SEOWENCECHAIUCTCR£STICS: 
( A ) LENGTH: 12 bu« pain 
( B ) TYPE- siideic aetd 
( C ) STRANDEONESS; liable 
( D)TOPOLOCY:Iiaaf 

- ( I I )MbL£a;i£ TYPE:' b.VA (probe) 

( X I ) SEQUENCE DESCRIPTTON: SEQ © NCh26: 

TTTATCGOGT GA 



( 2 ) (NFORMAnON FOR SEQ 0) NO:27: 

( I ) SEQUENCE CHARACTERlSnCS: 
( A ) LENGTH: 12 bue pain 
( B ) TYPE: nodcle tcid 
< C ) STRANDEONESS: sm^Ie 
( D)TOPOLOGY:Ibear 

( I I ) MOLECULE TYPE: D.VA (probe) 

(m \ ) SEQUENCE DESCRIPTION: SEQ © N0-J7: 

TTGATTTATG GC 



( 2 ) D«TOR.\(AnON FOR SEQ 0) N0:2S: 

( I ) SEQUENCE CHARACTERlSnCS: 
( A ) LENGTH: U bi« palra 
( B )TYP£:oodetc actd 
•(C) STRANDEONESS: sm^e 
( D) TOPOLOGY: linor 

( M ) MOLECULE TYPE: DNA (probe) 

( X i ) SEQUENCE DESCRTPnON: SEQ ID Nb:2S: 

AACCTATTTC ATT 



( 2 ) DflFORMAnON FOR SEQ © NO:29: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENCmC 12 btx pain 
( B )TYPE:aodctc>cId 
( C ) STRANDEDNESS: slagte 
( D)TOPOL0GY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ © KO-^. 

CGACCAAACC TA 



( 2 ) DiTORMAnON FOR SEQ © NO:30: 

( I ) SEQUENCE CKARACIERtSnCS: 
( A ) LENGTH: U b«ie pain 
( B )TYP£:DodcIeacId 
( C ) STRANDEDNESS: shgte 
( D )T0P0LOCnr:lltteii 

( I I ) MOLECULE TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRIPTION: SEQ © NO-JOt 

ACGCTACCAC CA 



( 2 ) INFORMAnON FOR SEQ © NOJl: 

( I ) SEQUENCE CHARACTERISnCS: 
< A ) LENGTH: Ubue pain 
( B )TYPE:aiidcicacKl 
( C ) STRANDEDNESS: slagte 
( D)T0POLOCY:thev 
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( I I )MOt£CULETYPE:DXA0»obc) 
(si) SEQUENCE OESCRTFnON: S£Q ID HO-JU 
GGtcf CTCTC TGC .* \'" 



( 2 ) ISTOSMXnOS FOR SEQ (D NOJI: 

( I ) SEQUENCE CHARACTEUSnCS: 
( A ) lESCnt: U bue pain 
( B ) TYPE: BDdcie add 
( C } SntANDEONHSS: •bflc 
( D)TOPOLOCY:tmev 

( I I ) MOLECULE Tirre:D.VA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NOJi 

CCGTOTCTOT CTGC 



1 4 



( 2 ) INFORMXnON FOR SEQ ID KOJJ: 

( 1 ) SEQUENCE CKARACTERtSnCS: 
( A ) lENCTK: U bas« jnln 
(B)TY7£:ni)cIt:c»cI(f 
( C ) SnUNDEDNESS: alajle 
(D)TOPOLOGY:lbar 

( I 1 ) MOLECULE TYPE: ONA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NOJ3: 

OGTGTGTCTC TGCT. 

( 2 -) INFORMAnON FOR SEQ ID ftO-M: 

( I ) SEQUENCE CHARACIERISnCS: 
( A)LEN<7TK: 12 base pain 
( B)TYPE:BQdcIcKUJ 
( C ) STRANI)ED»N-ESS: sb^e 
(D)TOPOLOGV.tkear 

(II) MOlian£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO J4: 

CTCCCTACGA TO 



( 2 ) INFORMAnaV FOR SEQ a> N0-J5: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: 12 baM pin 
(B)TYFE:aodcletdd 
( C ) STRANDEDNESS: sla^e 
(D)TDPOLOCTeiiiiar 

( I I ) MOLECULE TYPE: DXA(frobe) . . 

( X I ) SEQUENCE DESCROTION: SEQ CD N0-J5: 

TOCTGGGTAG GA 



1 2 



( 2 ) INFOR\(AnOK FOR SEQ ID NO-Jd: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 btscpaln 
( B )TYP£: ooddcftcld 
( C ) STRANDEDNESS: sh^c 
(D)T0POLOGY:Iiseu 

( i i ) MOIECUIE TYPE: D.VA (probe) 



( X 1 ) SEQUENCE DESCRCPTtON: SEQ CD N0-J4: 



49 



5,837,832 

-coatmued 



50 



TCTCCTCGCT AC 

- ( 2 ) INTORMAnaV FOR SEQ ID N0-J7: 

. • ( I ) SEQUENCE CKARACTCRCSTTCS: \. 

( A ) LEXCTK: 12 bue pain' 
( B ) TY?£: oactcie acid 
( C ) STPANDEDNESS: t'm^e 
< D)TOPOLOCY:rQaf 

( I I ) MOLECULE TYPE: D.VAOrobO 

( X I ) SEOUENCZ DESCRIPTION: SEQ ID N0J7: 

CTTACCACCG CT 

( 2 ) INFORMXnON FOR SEQ ID N0-J8: 

( i ) SEQUENCE CHAHACreRISnCS: 
( A ) L£NCTK: 12 base pin 
( B)TYP£:aacIcIe xid 
( C ) STRA.VDEDNESS: »ui^« 
( D)T0POt.OCY:tbaj 

( i I )MOl£CULETYPE: D.VACfrobe) 

Cxi) SEQUENCE DESCRIFTtON: SEQ ID NO-JS: 

CCCTTACCAC CC 

( 2 ) INFOR.VWnON FOR SEQ ID N0-J9: 

( I ) SEQUENCE CKARACIHUSnCS: 
( A ) LENGTH: U hu< pain 
( B ) TYPE: Btictcle add 
( C ) SniANDEDNESS: sb^e 
(D)TOP0LOGY:Ihaf 

( I I ) MOLECULE TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NOJJ: 

ACCGCCGCAG C 



( 2 ) INFORMAnON FOR SEQ ID NO:40: 

< I ) SEQUENCE CHARACTERISna: 
( A ) LENGTH: 10 bas« pain 
( B)TY7£:ottcIcIe«dd 
( C ) SnUNDEONESS: ifa^c 
( D)TOPOLOGY:UDar 

( I I ) MOLECULE TYPE: D.NA(f»obf) 

( X 1 ) SEQUENCE DESCRIPTION: SEQ ID KO:40t 

ACCCGCCCAG 



1 0 



( 2 ) DTORMAnON FOR SEQ ID N0:4l: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: 11 bis< pain 
( B )TY?E:BQdckKld 
( C ) STKANDEDNESS: ibgte 
( D)TOPOLOCY.tiiiear 

( 1 I ) MOLECULE TYPE: D.VA(f:rob<} 

( X I ) SEQUENCE DESCRIPTION: SEQ ID N0:4L- 

OCTTGCTTCC C 



1 1 



( 2 ) CfFORMAnori FOR SEQ ID NO:42: 
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( I )$EOi;iNCECHARACTCR£SnCS: 
( A ) LENCTK: U ba^ pa!n 
( B )TYP£.-&t)dae»e{d 
( C ) STBANDEONESS: I'm^e 
; ( D )TOroLOGY^ Uaeu. - 

\ (II) MOLECUl£TYP& D.VA(ptobc) 

( z I ) SEQUENCE DESCRI?nON: SEQ [DNO:42: 

OCOTTTCOTT GO 



( 2 ) tSfORMAJlOi< FOR SEQ D tfOAh 

( 1 ) SEQUENCE CKAAACTERESnCS: 
( A ) LENC7TK: 12 biscjaUt 
( B ) TYPE: aodele scU 
( C ) STRANDE0NE5S: sbjle 
( D)TOPOLOC\l linear 

( t I ) MOIICUtE TYPE: D.VA (probe) 

( X 1 ) SEQUENCE DESCRimaV: SEQ ID N0:4J: 

CATCTTTCCG CT 



1 3 



( 2 ) INFORMAnoy FOR SEQ ID N'0:il: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENCTK: 12 hue pain 
iB)TY?B BQctetcadd 
( C ) SntANDEONESS: »mg^c 
( D )TOPOLOCY:rroar 

( I i )MOLECUl£ TYPE: D.VA (probe) 

( X I ) SEQUENCE OESCRXFTTON: SEQ ID N0:*4: 

CGGTOATCTT TO 



1 2 



( 2 ) INFORMAnON FOR SEQ ID KO:45: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: 12 bue pain 
( B) TYPE: Bodele add 
( C ) SnUNDEDNESS: ib^e 
( D)TOFOLOGY:tmeir 

( 1 I ) MOLECULE TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCIU7TI0N: SEQ ID NO:45: 

TGTGOGCGGT CA 



1 2 



( 2 ) INFORMATION FOR SEQ ID KO:46: 



( I ) SEQUENCE CHARACTERISTICS: 
( A}LEN(7TH:13b«j«pa!n 
( B) TYPE: aocUk add 
< C ) HANDEDNESS: ib^Ie 
( D)T0POLOCY:lae3r 

( I 1 ) MOLECUIE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID N0:*6: 

TAAACTCTCO OG 



1 2 



( 2 ) DfFOR\CAnON FOR SEQ ID NO:47: 

( I ) SEQUENCE CKARACIESISnCS: 
( A)LENCTH:Ubaaepa!n 
( B)TYP£:BQdeIeacU, 
( C )STTUNDEDNESS: ih^c 
(D)TOPOLOCYinnetf 
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( I I ) MOLECULE TYPE: DXA (probe) 
'( X I )SEOUENC£ pESOUTnON: 5EQ (DNO:47: 
- ■ CCTACATAAA - CTO ' " - - .' I 3 ' 



( 2 ) INFORMAnON FOR SEQ tD liO:*S: 

( I ) SEQUENCE CHARACIERtSnCS: 
( A ) LEKGTO: U bi*e piin 
( B )TYP£:oQddetcId 
( C ) SrniANDEDNESS: tm^e 
( D)TOPOLOCY:Ibear 

( I i )MOL£CUl£ TYPE: DNA (probe) 

( z i ) SEQUENCE DESCRIPTION: SEQ (D NO:48: 

CACCTAAGCT ACA 13 



( 2 ) INFORNCATION FOR SEQ ID NO:49: 

( I ) SEQUENCE CKARACTERC^CS: 
( A)LENCTK:12 biMpaln 
( B)TYPE:aDcl<:cKl<l 
( C ) SntANDEDNESS: sm jle 
( D)TOPOLOCY:loev 

( I i ) MOLECULE TYPE: D.VA (probe) 

( M I ) SEQUENCE DESCRIPTION: SEQ ID N0:*9: 

. GAGCAGGTAA GC . 



( 2 ) INFORMAnON FOR SEQ ID NOJO: 

{ I ) SEQUENCE CHARACIERISnCS: 
( A ) LENGTH: I2bu« pain 
( B )TYPE:o«ltic kM 
( C ) STRANDEDNESS: «ta£le 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (pobe) 

( It I ) SEQUENCE OESCRIFTTON: SEQ tD NO-JQ: 

TGCTTTGAOG AC 



( 2 ) [NFORMAHON FOR SEQ ID HOJU 

( t ) SEQUENCE CHARACT ER1& I I CS; 

( A)L£NGTK:Ubas«paIn 
( B )TYPE:oodtIe kU 
( C ) STKANDEDNESS: sb jte 
( D)TOPOLOGY^ linear 

(II ) MOLECULE TYPE: DNA(frob«) 

( I I ) SEQUENCE OESCROOTON: SEQ ID NO-Jl: 

AGTCTATTCC TTT 



( 2 ) INFORMAnON FOR SEQ ID N0-J3: 

( I ) SEQUENCE CKARACTERIsnCS: 
( A ) lENGTH: U bas* pain 
( B) TYPE: aadeie Kid 
( C ) SnUKDEDNESS: *hx^z 
( D)T0P0L0GY:[fficar 

( 1 i ) MOLECULE TYPE: DNA (probe) 



(Ml) SEQUENCE DESCRIPTION: SEQ ID NOJZ: 
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CATTTTCAGT OTA 



1 i 



( 2 ) C^ORMXnO.V FOR SEQ ID K0-J3: 

(I ) SEQubfCT CHASACreRCSnCS: 
•( A ') LENGTH: 13 bii;! pain 
( B )TYP£: Bodete lefd 
( C ) STRANDEONESS: sbgle 
( O) TOPOLOGY: lioeu 

( I I ) MOLECULE TYPE: DXA(fT0bc) 

( X 1 ) SEQUENCE DESCRIPTION: SEQ 0) NO-JJ: 

TAAACATTTT CAC 



1 3 



( 2 ) (NPORMAnON FOR SEQ ID tiO-Mi 

( I ) SEQUENCE CKARACTERISTTCS: 
( A ) LENGTH: 12 b**c paUi 
( B )TYP£: oactctc Ktd 
( C ) STRANDEONESS: liable 
( D)TOPOLOCY:lbar 

( I I ) MOLECULE TYPE- DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-J4: 

AGCCCGTCTA AA 1 2 



( 2 ) INFORMAnON FOR SEQ CD N0J5: 

( I ) SEQUENCE CHARACIZRISTTCS: 
. ( A)l£NGTH:12buepIn 
( B)TYPE:o»dclcicid . 
( C ) STRANliEDNESS: tinale 
( D ) TOPOLOGY: Vaar 

( I I ) MOUECULE TYPE: DNA Cfcflbe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:55: 



< 2 ) INFORMATION FOR SEQ ID NO-^: 

( t ) SEQUENCE CHARACTERtSnCS: 
( A )tJENCTK: 12 buc pain 
( B )TYPE aiide!e*cU 
( C ) STRAVDEDNESS: sbgle 
( D )T0POL0OYiUnaf 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIFnON: SEQ ID N0-J6: 

TCATCTCACC CC . 



( 2 ) INFORMAnON FOR SEQ ID NO-^: 

< 1 ) SEQUENCE CKARACTHRISnCS: 
( A ) LENGTH: 12 bsM pain 
( B ) TYPE: Bodcle add 
( C ) STRANDEDNESS: th^t 
( D)T0P0L0G1ftUnar 

( t I )MOLECUL£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID tioJsii 

GGCCTCATCT CA 



( 2 ) INFORMAnON FOR SEQ ID KO-^ 



/ 
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( t ) SEQUENCE CKARACTEROTICS: 
( A ) LENGTH: 11 bu« (Utn 
( B ) TYPE: Bodeic acid 
( C ) SIRANDEDNESS: single 
< D )TOP0LOOY: liBor . 

( I 1 )MOLECUtETVP£:D.VA0robe) 

( X I ) SEQUENCE DESCRTPTTON: SEQ ID NO:58: 

CACTCCCACG C 

( 2 ) [>fFOR.VWnO*V FOR SEC D KO-^: 

( I ) SEQUENCE CKARACTEHICTICS: 
< A } t£NGTH: 12 bue pain 
( B ) TYPE: 0Dd«ie tcUS 
( C ) STKA>DEONESS: ib^le 
( D)TOP0L0CY:iinar 

( I 1 } MOLECULE TYPE: ONA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ IDN0-J9: 

CTATCCOAGT CO 

( 2 ) INFORMAnON FOR SEQ Q) NO:60: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENOTK 1* b«e paxn 
( B ) TYPE Bodetc acid 
( C ) STKANDEONESS: sln^e 
< D }TOPOLOGY: linear 

.( i I ) MOLECULE TYPE: DNA (probe) . 

( « I ) SEQUENCE DESCRTPTtON: SEQ ID NOitfO: 

CATTAGTAGT ATOG 



( 2 ) INFORMAnON FOR SEQ CD K0:6l: 

( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH: IJ baae pain 
( B )TYPE iiBdete»eId 
( C ) STRANDEONHSS: sm^e 
( D ) TOPOLOGY: laear 

( i I ) MOLECULE TYPE: D.VA (probe) 

( z i ) SEQUENCE DESCRIPTION: SEQ ID NOrtf 1: 

TGAATCAGAT TAG 



1 3 



( 2 ) INFOR-VMnON FOR SEQ ID N0:«2: 

- (I ) SEQUENCE CKARACIERISnCS: 
( A)L£NGTK:UHj«paIn 
( B ) TYPE: noclek add 
( C ) SntANDEDKESS: sla^fe 
( D)TOP0L0CY:Uaai 

( I I ) MOLECULE TYPE: DNA (probe) 

( Ji I ) SEQUENCE DESCRIPTION: SEQ ID NO:62: 

ATTOAATCAG ATT 



( 2 ) C^ORMAnON FOR SEQ ID hOM: 

( I ) SEQUENCE CKARACTERCTTCS: 
( A ) tENCTH: U base pain 
( B ) TYPE: Dodek add 
( C ) SnUNDEDNESS: thgte 
(D)TOP0LOG\iIaev 
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(II) MOLECULE TYPE: DXA (probe) 
( « I ) SEQUENCE VESatimOSt SEQ ID NO:63: 
GOCTTCTATT CAA; . . 

( 2 ) INFORMAnO.V FOR SEQ ID HO-M: 

( I ) SEQUENCE CHARACTEROTCS: 
( A ) lENGTK: 10 bas« patn 
( B)TYFE: aoclclc tcU 
< C ) STRANDEONESS: sin jl« 
( D)T0P0L0CY:Ii8Of 

( I I )MOLECUl£TY?E:DXA (probe) 

( X I ) SEQUENCE DESCRIPnON: SEQ ID NO:^4: 

CCCCCCCTTC 

( 3 ) INPORMAnON FOR SEQ ID K0:6S: 

( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH: 10 btsc pain 
( B )TY7£: sodclc 9eld 
( C ) STRANDEDNESS: smgte 
( D)TOPOLOCY:Iioor 

( I I ) MOLECULE TVTE: DNA(prob«) 

( X i ) SEQUENCE DESCRIPnON: SEQ ID NO:65: 

ATCCOCGCGC 



I 3 



( 2 ) INFORMAnON FOR SEQ D Naifi: 

( I ) SEQUENCE CHARACIERISnCS: 
( A ) LENGTH: 11 bsM pain 
( B )TYT£:otKl<Ie tcld 
( C ) STRANDEDNESS: smgrc 
( D ) TOPOLOGY: Uacar 

( i I ) MOLECULE TYPE: DNA(prob<) 

( X I } SEQUENCE DESCRITTTON: SEQ ID NO:66: 

TAGGATGCGC C 



1 1 



( 2 ) INFORMAnON FOR SEQ D Na67: 

( I ) SEQUENCE CHARACTERISTTCS: 
( A ) LENGTH: 12 bu« pain 
( B ) TYPE* aoctcle add 

( C)STKAN'DEDNESS:sInjIa . 
(D)TOPOLOCY:UBar 

.(II ) MOLECULE TYPE: DNA<prob«) 

( X i ) SEQUENCE OESOUFTION: SEQ CD NO:67: 

TCCGTACGAT CC 12 



( 2 ) INFORMAnON FOR SEQ ID ttO-M: 

( I ) SEQUENCE CKARACTERCTICS: 
( A ) LENGTH: 12 base pain 
( B) TYPE: Bodele add 
( C ) STRANDEDNESS: sb^e 
( D)T0POLOGY: Uocar 

( I I } MOLECULE TYPE: DNA (probe) 



( X i )SEQUENC£ VESCUmOS: SEQ ID NO:68: 
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CTCCTCCCTA GC 

( 2 ) DnFORMAnoSr for SEQ id N0:6g: 

. . ( r ) SEQUENCE CKAIUCTERISTl.CS: 

( A ) LEKGTK U bwc pain - * • • 

( B ) TY7E: asdetc tad 

( C ) SnUKDEDNESS: jingle 

( D)T0P0LOGY:IIiiar 

( I I )MOl£a;i£TYP&D.NA<pcobe) 

( « I. ) SEQUENCE DESCRintON: SEQ CD NO:69: 

TGTGTGTOCT CC 



( 2 ) INFORMAnON FOR SEQ ID NO:TO: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: L2 base pain 
( B )TYPE:aoeIc!e>c!d 
( C ) SnUNDEDNESS: sm^te 
( D}T0roLOGY:Uiiar 

( I 1 ) MOLECULE TYTE: D.VA (probe) 

( X 1 ) SEQUENCE DESCRITnON: SEQ ID NO:70: 

GCGGTGTGTG TO 



( 2 ) INFORMAnON FOR SEQ ID N0:7l: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B )TY?&eiidcIeKuJ 
( C ) STRANDEDNESS: »iiifi(e 
( D ) TOPOLOGY: Ihear 

( I I ) MOLECULE TYPE- D.VA (probe) 

( X 1 ) SEQUENCE DESCRIPTION: SEQ ID N0:7U 

TAOCACCCGT GT 12 



( 2 ) INFORMATION FOR SEQ D NO:72: 



( I ) SEQUENCE CKARACTERISnCS: 
( A )LENCnK 12ba*«paln 
( B )TYP£:oodeIe*cU 
( C ) STRANDEDNESS: single 
'( D)TOP0L0GY:nnar 



( I I )MOl£CULE TYPE: DNA (probe) 
( X I )SEQUENCE DESCRIPTION: SEQ ID NO-.72: 
TGGCGTTAGC. AG. 



( 2 ) INFORMAnON FOR SEQ ID NO:73: 

( 1 ) SEQUENCE CKARACim mi CS: 
( A ) LENGTH: 12 base pain 
( B)TYFE:aQdcIc kM 
( C ) SntANDEDNESS: thgle 
( D)T0P0L0C1ti[lDear 

( I I ) MOLECULE TYPE: DXA (probe) 

( X I } SEQUENCE DESCRIPTION: SEQ ED N0:7J: 

OCTATCGCCT TA 12 



( 2 ) tNFORMAnON FOR SEQ ID K0:74: 
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( i ) SEQUENCE CKARACTHRCSnCS: 
( A ) LENGTH: U bue pain 
( B )TYFE:aQdeicacId 
( C ) STRANDEDNESS: »ingl« 
.. ( D)T0roLOCV:lmeir - 

• • ( i i ) MOLECULE rif?E: DN'A(prob«) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:74: 

OTTCCOCCTA TO 



( 2 ) INFORMAnON FOR SEQ tD NO:75: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: 12 bue pain 
( B)TYPE:nodeie Vid 
( C ) STKANDEONESS: »fa«Ie 
( D.) TOPOLOGY: lm«r 

( I I )MOLECUl£ TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCRimON: SEQ ID NO:75: 

OCTCOTCTTA CO 



( 2 ) INFORMAnON FOR SEQ ID N0:7ai 

( i ) SEQUENCE CKARACTERISnCS: 
( A ) LENCTK' 12 bM pain 
( B )TYPE:BocIeIe >ctd 
( C ) STRANDEONESS: lin^Ic 
( P )TOP0LOCY: tinear 

(II ) MOLECULE TYPE: DNAOxobe) 

. ( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:76: 

CGTTACOCTC OT 



( 2 ) DiTORMAnON FOR SEQ ID NO:77: 

( I ) SEQUENCE CHARACTERtSnCS: 
( A ) LENGTH: U base p«in 
( B )TY?E:Boclcic*e{d 
( C } STRANDEDNESS: single 
( D ) TOPOLOGY: Imar 

(11) MOLECULE TYPE- DNA (ptobe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:77: 

AAATCTGGTT AGO 



( 2 ) WFORMATON FOR SEQ ID NO:78: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: Ubwe pair* 
( B)TY7E:aiicleK:ac{(l 
( C ) SniANDEONESS: sb^Ie 
( D}T0P0LOCY:Uiicif 

( I i ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:78: 

AAATTTCAAA TCT 



( 2 ) INHJRMAnON FOR SEQ O NO:79: 

( 1 ) SEQU&'CE CHARACTHUSTICS: 
( A ) LENGTH: U bu« pain 
( B)TYPE:BaeIc{c»ckI 
( C ) STRANDEDNESS: »lng(e 
( D)TOPOLOCY:Uae>r 
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( I I )MOLECUl£TYF£:DXA (probe) 
. ( X I ) SEQUENCE DESCRIPTTON*: SEQ [D |V0:79: 
. AAOATAAAAT TTC 



( 2 ) INFORMAnON' FOR SEQ ID N*O:80: 

( 1 ) SEQUENCE CKARACTEStSTtCS: 
( A ) LENGTH: 12 bjse pain 
( B )TYFE:&ocIcIe*cU 
( C ) SntANDEONESS: iln^e 
( D) TOPOLOGY: laof 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:80: 

CCCAAAAACA TA 



( 2 ) DJFORMAnON FOR SEQ W N0:8l: 

( i ) SEQUENCE CHARACTERISnCS: 
( A ) L£NGTH- tl baw pain 
( B )TY7E:aaeteIc>c!d 
( C ) STRANDEDNESS: ibgle 
( D ) TOPOLOGY: lisev 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTTON: SEQ ID N0.81: 

CGCCAAAAAC A 

( 2 ) [NFORMAnON FOR SEQ ID NO:82: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENCTTH: II bu* pain 
( B )TY?£:Badcte»e!d 
( C ) STRAM}EDNESS: sh^e 
( D) TOPOLOGY: Uftor 

( I I ) MOLECULE TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:82: 

CATACCOCCA A 



( 2 ) INFORMAnON FOR SEQ tD NO-13: 

( I ) SEQUENCE CKARACiratSnCS: 
( A ) LENGTH: U btM pain 
( B )TYP£:BDd<Ieic!d 
( C)STRANDEDNESS:siQgt« 
(0)T0POLOGY:liftaf . 

( I I ) MOLECULE TYPE: D.VA (probe) 

( X t ) SEQUENCE DESCRIPnON: SEQ ID NO-^3: 

AAAAOTCCAT ACC 



( 2 ) CNFORMAnON FOR SEQ (D NO-^: 

( I ) SEQUENCE CKARACIESUSTICS: 
( A ) LENC7IH: U bu« pain 
( B)TYP£:aDdcicKul 
( C ) STRANDEDNESS: s'o jlc 
< D) TOPOLOGY: Unci/ 

( I I ) MOLECULE TYPE: DNA (probe) 



( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-.S4: 
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TGTTAAAAGT CCA 



( 2 ) INTORMAn0.y FOR SEQ ID NO^: 

I ) SEQUENCE CHAIUCrcRISnCS: 
. ( A ) LENGTH: IJ b»i« pairi' 
( B ) TY7E: onclck tad 
( C ) SIRANDEDNESS: tm^c 
( D)TOPOLOGY:liwar 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEO ID NOiSJ: 

GGGTCACTCT TAA 



( 2 ) INFORMAnON FOR SEQ ID NaSfi: 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bii« pin 
( B)TYPE:aucIclc kM 
( C ) STRANDEONESS: ila^e 
( D ) TOPOLOGY: Uaeu 

( I I ) MOLECULE TYPE: DNA (frobe) 

( K I ) SEQUENCE DESCRIPTTON: SEQ ID NO:M: 

CGGGGTGACT GT 



( 2 ) INFORMXnON FOR SEQ ID N0.87: 

< i )SEQUENCECHAlUCTERtSnCS: 
( A ) LENGTH: U hue pain. 
( B )TYPE:nocteic»eid 
( C ) STRANDEDNESS: shale 
( D ) TOPOLOGY: Uaear 

( I I ) MOLECULE TYPE: DNA Opcobe) 

( ji I ) SEQUENCE DESCRIPTION: SEQ ID NO:87: 

AGTTCGGCCG T 



( 2 ) INFORNWnON FOR SEQ ID NOfiS: 

{ I ) SEQU&»*CE CKARACTERCSTICS: 
( A ) LENGTH: U base pain 
( B )TYPE:ooc!«Ic»dd 
( C ) STRXVDEDNESS: ibgle 
.(D)T0P0LOGY:Uaear 

( I I ) MOLECULE TYPE: DNA (probt) 

( K I ) SEQUENCE DESCRIPTTON: SEQ CD NO-^ 

TGTCTTAGtT OGC 



( 2 ) INFORMAnON FOR SEQ CD NO-^: 

( I ) SEQUENCE CHARACTBIISTTCS: 
{ A ) LENGTH: U b«« pain 
( 8 ) TYPE: aodelc add 
( C ) CTRANDEDNESS: »bg|e 
( D)TOPOLOGY: Unor 

( I I ) MOLECULE TYPE: DNA (probe) 

( a I ) SEQUENCE DESCRIPTION: SEQ ID NO-^ 

AAAATAATOT CTT 



< 2 ) [NFORMAnON FOR SEQ ID NO:90: 
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( 1 ) SEQUENCE CHARACTERISnCS: 
( A ) lENCTH: I j bwe piio 
( B)TYPE:B«IeIc»cId 
( C ) STTIAKDEDNESS: Jingle 
{ i) ) TOPOLOGY: Unor ■ 

:. (11 )MOLECUi£TYPfc DNA Cpobe) ' 

( « I ) SEQUENCE DESCWmON: SEQ ID NO:90: 

ACCGCAAAAT AA 



( 2 ) INFORMAnOK FOR SEQ © K0:91: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 btic pain 
( B )TY7£: fiodele >cld 
( C ) STRANDEONESS: th^c 
( D ) TOPOLOGY: Ibear 

( I I ) MOLECULE TYPE: DNA(ptobe) 

( « I ) SEQUENCE DESCRIFnON: SEQ tD N0:91: 

GOACCGCAAA AT 



( 2 ) [NFORMAnON FOR SEQ © NO:92: 

( I ) SEQUENCE CHARACTERrSTTCS; 

( A ) LENGTH: 12 ba»e pain 
( B }TYPE:aocIet«*c!d 
( C ) STRANDED.SESS: single 
( D)T0POLOGY:linar 

( i I . )MOL£CUl£ TYPE: DNA (probe) 

( X I .) SEQUENCE DESCRIPTION: SEQ © NO:92: 

OGAAATTTTT TO 



( 2 ) INFORMAnON FOR SEQ © NO:93: 

( I ) SEQUENCE CHARACIERESnCS: 
( A ) LENGTH: 12 b«« pain 
( B)TYPE:nodeic»ckl 
( C )SrRAM)EONESS: »ingle 
( D)TOPOLOGY:Uacar 

( i I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIFnON: SEQ © NO:93: 

CCTCGAAATT TT 



( 2 ) INFORMXnON FOR SEQ © NO:94: 

/. ( I )SEQUENCECKARACIERtSnCS: 
( A ) LENGTH: It bwe pain 
( B }TYP£:Bact<tcacU 
( C ) STRANDEDNESS: *bgle 
( D) TOPOLOGY: tmeir 

( I I ) MOIECUIE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ © NO:94: 

CGTTTCGTOG A 



( 2 ) DfFORMAnON FOR SEQ © NO::95: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: II hue pain 
( B)TYPE*Bad<!c*cId 
( C ) CTRANDEDNESS: «tog!e 
( D)T0P0LOCY:Uaar 
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( 1 1 )MOLECUl£TYrc:D.VA0«obe) 
■( x t ) SEQUENCE DESCRimONi SEQ ID NO:95: 



( 2 ) D.TORMAnON FOB SEQ 0) ttOS6: 

i I ) SEQUENCE CKARACreaiSnCS: 
( A)LENaTK: tObuepaIn 
( B)TY7£:aQdeIcKM 
( C ) STRANDEDNESS: sIb^Ic 
(D)T0roLOGY:liaear 

( i i )MOL£CUU TYPE: DNA (probe) 

( ji I ) SEQUENCE DESCRirnON: SEQ ID N056: 

CCCCCGCACO 10 



( 2 ) DTOELVWnON FOR SEQ D NO:97: 

( I ) SEQUENCE OURACTERSnCS: 
( A ) t£N<7rH: U baM piln 
( B)TY7£:Dtidiie»dd 
( C ) STRANDEONESS: 
(0)TOPOLOCY:Uaear 

( I I )M01JECUI£TYPE:D.VA (probe) 

( « I ) SEQUENCE DESCRtrnON: SEQ ID NO:97: 

CACAACCGCC C .11 



( 2 ) [NTORNOOION FOR SEQ 0) NO:98: 

C t ) SEQUENCE CKARACTHRISnCS: 
( A)LENCTK:l2bas« pain 
( B ) TYPE- eaclck add 
( C ) STRA.N-DEDNESS: sbgle 
( D)T0POLOGY:U&cu 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ CD N0:9S: 

CTACCCCACA AG 12 



( 3 ) INF0R.^CA^0N FOR SEQ D N0:99: 

( i ) SEQUENCE CKARACTERCSnCS: 
( A ) LENCfTH: 12 bu«paln 
( B)TirFE:aad<Ic»cId 
( C )STRA.VDEDNESS: sbjle 
( D )T0P0LOCYi Unax 

( i I ) MOLECULE TYPE: DNA (probe) 

( B i ) SEQUENCE DESCRIPTION: SEQ [D N0:99: 

OTOCTGTACC CC 12 



( 2 )DfFOR.\CAnONFORSEQ©NOil00t 

( t ) SEQUENCE CHARACI hms UCS: 
( A } L2NCTK' U bts« pain 
( B )TYPE:oadcIe*dd 
( C ) STRANDEDNESS: txn^e 
( D)TOPOLOG%liiicu 

( i I ) MOLECULE TYPE: DNA (probe) 



( X t ) SEQUENCE DESCRIPTtON: SEQ ID NO:L00: 
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TCTTTAACTO CTC 



1 3 



( 2 ) CSTOR-VOOTON' FOR SEQWSO.VQU 

■ ( i .)SEQUEXaE CHAaAarERt5nCS: - 
( A ) LENGTH: U b«e pain 
( B ) TYPE: oscteic add 
( C ) STOANDEDN^: ilngle 
(D)TOPOLOCY: linear 

( I I )MOl£CUl£ TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCRIFnON: SEQ D) NOilOl: 

TGTOTTTAAG TGC 



1 3 



( 2 ) INFORMAnON FOR SEQ [D NaiOi 

( i ) SEQUENCE CHAlUCTSttSnCS: 
( A ) LENCTTK- L3 base pain 
( B)TYPE:Bod.ic»eid 
( C ) STRA>DEDNESS: ilnflle 
( D)TOPOLOCY:tiaear 

( I I ) MOLECULE TYPE: DNA (probe) 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NOilOt 

OCAGAOATGT GTT 



1 3 



( 2 ) INFORMAnON FOR SEQ ID KOilOJ: 

( I ) SEQUENCE CHARAOERISnCS: 
■ (A) LENGTH: 12 base pain 
( B ) TYPE: aoeletc Kid 
( C ) STRANDEDNESS: ihgle 
( D)TOP0L0GY:liiieM 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ □> NaiQJ: 

TTTOCCAGAO AT 



( 2 ) INFORMAnaV FOR SEQ ID NO:I0«: 

( I ) SEQUENCE CHARACIERISnCS: 
( A ) LENGTH: 11 bu« pain 
( B)TYPEBiid€icicW 
( C ) SnUNDEDNESS: shjle 
{D)TOP0L0G\illB«ar 

( I I )MOl£CUl£ TYPE: DNA (probe) 

. ( X I ) SEQUENCE DESCRUTION: SEQ CD NOrlW: 

CGGGTTTCCC A 



( 2 ) INFORMAnON FOR SEQ ID KO:10S: 

( I ) SEQUENCE CKASACTERISnCS: 
( A ) LENGTH: U bttc pain 
( B ) TYPE: Bodck Kid 
( C ) SntANDEDNESS: sh^e 
(D)TOPOLOCnAUa£aj 

( I I ) MOLECULE TYPE: DNA (probe) 

( X t ) SEQUENCE DESCRIPTION: SEQ CD N0:lQ5: 

TCTTTTTGGG CT 



( 2 ) INFORMAnON FOR SEQ ID NO:106: 
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( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LEXCTK: L2 buc pttn 
( B )TY7£:ai>eteie«dd . 
■(.C)STRANDEDNESS:jhile . 
( D )TOPOLOCY: liaoi . , 

. (11 ) MOLECULE TYPE: DXA (probe) 

( X I ) SEQUENCE OESCRtPTION: SEQ CD NO:106: 

TTTOTTTTTG OC 

( 2 ) INFORMAnONFOR SEO ED KG: 107: 

( I ) SEQUENCE CKARACTERISTTCS: 
( A ) LENGTH: 12 b»3C pain 
( B )TYP£:oacleIc >cld 
( C ) SnUNDEDNESS: sb^le 
( D) TOPOLOGY lineir 

( I I ) MOl£CUt£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTtON: SEQ 05 NO:107: 

GGCTTCTTTG TT 

( 2 ) ENFORMAnON FOR SEQ ID KOilOS: 

( I ) SEQUENCE CHARACTERrsnCS: 
( A ) LENGTH: IJ b»c (win 
( B )TYP£:nncteic acul 
{ C ) STRANDEDPfESS: sb^Ie 
( D)T0P0LOCY: linear 

. ( J I ) MOLECULE TYPE: DXA (probe) 

( X I ) SEQUENCE DESCRIPTTON: SEQ VD NOtlCS: 

CTOTTAOCGT TCT 

( 2 ) INFORMAnON FOR SEQ ID NO:109: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: U biM pain 
( B )TYPE:BticteIc acid 
( C ) STRANDEDNESS: I'm^Ie 
( D ) TOPOLOGY: tiaeu 

(11) MOLECUU TYPE: DNA (probe) 

( X 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:109: 

TTTACTAAGT ATGT 



( 2 ) WFORMAnON FOR SEQ ID NO:U0: 

( 1 )SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U b«jc pain 
( B)TYP£:otide!e*e{d 
( C ) STRANDEDNESS: sb jle 
( D )T0P0LOCY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( X 1 ) SEQUENCE DESCRIPTION: SEQ ID NOtllO: 

AACACACTTT AGT 



1 3 



( 2 ) INFORMAnON FOR SEQ ID N0:UI: 

( I ) SEQUENCE CHARACTHRlSnCS: 
( A ) LENGTH: U bue pain 
( B)TYP£:iaeteIc*cia 
( C ) STRANDEDNESS: tla^* 
( D)T0POLOOY:UBev 
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( i I ) MOLECULE TYPE: D.VA(p»obe) 
( 1 I ) SEQUENCE DESCRIPTTON: SEQ CD N-QilU: ' 
AATTAATTAA CACA / 



( 7 ) INFOR-MXnON FOR S£Q ID NOtlli 

( I ) SEQUENCE CKARACTERlSnCS: 
{ A ) LENCTH: U bsM pain 
( B)TYPE:aodiIeacU 
( C ) SniANDEDfflESS: sb^e 
( D)T0P0L0GY:Uiic3r 

( I I )MOLECUl£TYPE: D.VACfTobc) 

( X I ) SEQUENCE DESCRUTION: SEQ ID NO:U2: 

AAOCATTAAT TAA 



I 3 



( 2 ) INFOR\tAnO.V FOR SEQ © N0:U3: 

( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH: 1} biM pjin 
( B )TY7£:Dodc]c »cU 
( C ) STKANDEDNESS: sm^le 
(D)TOPOLOCY:liiiaf 

( I I ) MOLECULE TYPE: D.VA (probe) 

( X i )SEQUEN'CE DESCRIPTION: SEQ ID NO:tlJ: 

GTCCTACAAC CAT 



. 13 



( 2 ) INirORMAnoS* FOR SEQ ID NOrlU: 

( 1 ) SEQUENCE CHARACTERtSnCS: 
( A)tJENCTH: DbaKpaln 
(B)TYP£: SBcUlc kM 
( C ) STBANDEDNESS: »ingl« 
(D)T0POLOCY:U3ear 

C I i ) MOLECULE TYPE: DXA (probe) 

( K i ) SEQUENCE DESCRIPTTO.V: SEQ ID NO:lU: 

TOTCCTACAA GCA 



1 3 



( 2 ) INFORMAnaV FOR SEQ ID NO:U5: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bts« pain 
( B)TYPE:Dodf!cacId 
( C ) STIUNDEDNESS: ih^e 
(D)TOPOLOCY: linear 

( i I )MOt£CULETYP£DNA(probe) 

( X I ) SEQUENCE DESCWPTTON: SEQ ID N0:tl5: 

ATTATTATGT CCT 13 



( 2 } INFOR>&aiON FOR SEQ CD N0:U6: 

( I ) SEQUENCE CKARACIEROTICS: 
(A)L£NCTK:14b3M|»In 
( B )TYP£:aQdcie*cId 
( C ) SntANDEDNESS: sh jte 
(D)TOPOLOCY:liaew 

( I i ) MOLECULE TYPE: ONA (probe) 



( X I ) SEQUENCE OESOUPTION: SEQ ID N0:U4: 
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TTGTTATTAT TATC 



( 2 ) INFOR.VtAnOiV FOR SEQ CD K0:ll7: 

- ( i ) SEQUENCE OCAiUCTEStSnCS: , 
( A ) LENGTH: U b»»e pjin 
( B ) TYPE: aodetc *cid 
( C ) STKANDEONESS: »h^c 
(D)TOPOLOCY; Dnof 

( I I )MOl£anXTYP£:DNA(frobe) 

( X I ) SEQUENCE DESCRfPTIO.V: SEQ O N0:ll7: 

ATTCAAATTG TTA 



1 3 



( 2 ) INFORMAnOy FOR SEQ ID N0:U8: 

( I ) SEQUENCE CHARA CI Utlb I I CS: 
( A )LENGTK: U bucpaln 
( B ) TYPE: aodeic 
( C ) STRANDEDSESS: iln^e 
( D)TOPOLOGY:Uaor 

( i i ) MOLECULE TYPE: DXA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEO U) NOillS: 

CCACACATTC AAA 



( 2 ) INFORMAHON FOR SEQ ID N0:U9: 

( I ) SEQUENCE CHARACTERISTICS: 

( A ) LENC7IH: 12 bwi p»in . 
( B )TYP£:noeI<ic«cW 
( C ) ynUNDEDNESS: »ia^e 
( D)TOroLOGY:lbar 

( I I }MOL£CUI£TYP£:DNA0«obc) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID N0:IL9: 



( 2 ) [NFOR-VOOTON FOR SEQ [D K0:t2a 

( i ) SEQUENCE CHARACTERtSnCS: 
( A)LENCTK: 12bu« pin 
( B }TYP£:ooclcfcscU 
( C ) STRANDEDNESS: stable 
( D ) TOPOLOGY: tiuar 

( I I ) MOLECULE TYPE: D.VA(Fob«) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:120: 

AAACTCGCTG TG 



( 2 ) INFORMAnON FOR SEQ (D KO:L2L: 

( t ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTK- U base pain 
( B)TYP£:iitKlck»diJ 
( C ) STRANDEONESS: ibglc 
( D)TOPOLOCY:UBetf 

( 1 1 ) MOLECULE TYPE: DNA (probe) 

( X 1 ) SEQUENCE DESCRIPTION: SEQ O K0:l2l: 

TGTCTCGAAA CTC 



( 2 ) CiFORMXnaV FOR SEQ ID N0:U2: 
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( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENCTK- U hue pain 
( B) TYPE: oodete Kid 
{ C ) STHANDEDNESS: ibjle 
(D) TO?OlOCri Vaeu 

(I I )MOLECUljETYP£D.VA(piobe) • 

( X I ) SEQUENCE DESCRimON: SEQ tD KO:l22: 

OATGTCTGTC TCC 



1 } 



< 2 ) INFORMAnON FOR SEQ tD Nam: 

( I ) SEQUENCE CHARACTCRISnCS: 
( A } LENGTH: U baj< pjin 
( B)TYPE: oodde »dd 
( C ) STOANDEDNESS: sh^e 
( 0)TOP0LOCTft tmcar 

( I i ) MOLECULE TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRimON: SEQ ID NO: 123: 

ATGATCTCTO TCT 



( 2 ) [NFORNMnON FOR SEQ ID N0:U4: 

( I ) SEQUENCE CKARACTERISnCS: 
( A)LENCTK:U bisepiin 
( B ) TYPE- ondeie acid 
( C ) STRANDEDNESS: itngle 
(D)TOPOLOCY: linear 

(I I )MOLECUli TYPE: DNA (probe) . - 

( » I ) SEQUENCE DEjsCRIPTION: SEQ ID N0:124: 

TTTTGTTATO ATC 



1 3 



( 2 ) INFORMAnON FOR SEQ ID NO:l25: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bii« pin 
( B)TYPE:RiKre!e Kid 
( C ) STRANDEDNESS: sm^e 
(D)T0P0L0GY:lise3/ 

(II) MOLECUl£ TYPE DXA (probe) 

( a I ) SEQUENCE DESCRIPTION: SEQ D) N0:125: 

TTTTTTGTTA TCA 



1 3 



( 2 ) DfFORMAnON FOR SEQ ID N0:126: 

( I )SEQUESCE CHARACTERISTICS: 
( A)L£N(7TH:Ubu«paIn 
( B )TYPE: ODclcte Ktd 
( C ) STRANDEDNESS: tbgle 
( D)TOPOLOCY:Iinaf 

( I I ) MOLECULE TYPE- DNA (probe) 

( K I ) SEQUENCE DESCRIPTION: SEQ ID N0:l2tf: 

ATACCGTGCT CC 



1 2 



( 2 ) [NFORMAnON FOR SEQ ID NO:127: 

( I ) SEQUENCE CHARAOERISnCS: 
( A ) LENGTH: 12 bu« pain 
( B) TYPE- oodclc Kid 
( C ) SntANDEDNESS: thgle 
(D)lOVQLOCrC Hscu 
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(II) MOLECULE TYPE: DXA (frob«) 
. ( X I ) SE01/HS*CE DESCRirnON: SEQ ID NOa27: 
CCGACATAOO bT' 



( 2 ) INTORMAnON FOR SEQ CD N0:128: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U biM pin 
( B )TY7E:tiiicteIeacId 
( C ) STltANDEONESS: itoale 
( D) TOPOLOGY: Ucor 

( I I ) MOLECULE TYPE: D.VA (probe) 

( X 1 ) SEQUENCE DESCRIPTION: SEQ 0) N0:128: 

TACTOCCACA TAG 



1 3 



( 2 ) INTORMAnON FOR SEQ ID NO:Ufc 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH- U bii« pair* 
( B )TYPE:oiidcIe kU 
( C ) STRANDEDNESS: s'mffe 
( D)TOPOLOCY:Unc«r 

( I I ) MOLECULE TYPDDNA (probe) 

( jc { ) SEQUENCE DESCRIFHON: SEQ ID N0:i:9: 

OACAGATACT bcC 

( 2 ) INTORMAnON FOR SEQ CD NO:Ua 

( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH: U bw pain 
( B ) TYPE: awlele Kid 
( C ) STTtANDEONESS: sb^e 
( D ) TOPOLOGY: naear 

(II) MOLECULE TYPE: D.VA (probe) 

( * I ) SEQUENCE DESCR^PTTO^^: SEQ ID Nai30: 

AATCAAACAC ACA 



{ 2 ) DJFORMAnON FOR SEQ ID NaUl: 

( I ) SEQUENCE CKARACTERUnCS: 
( A ) tENGTK: U btM pain 
( B ) TYPE: oodcle acid 
( C ) STRANDEDNESS: ifajle . . 
( D)TOPOLOCY:Unev 

( I I )MOLECULETYPE:DNA(protie) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NOtUl: 

ACOAATCAAA OAC 13 



( 3 ) INFORMAnON FOR SEQ ID N0:U2: 

( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH 12 bajc pain 
( B ) TYPE- ODdelc add 
( C ) STRANDEDiNESS: tajte 
( D)T0P0LOGY^Uacar 

( 1 I ) MOLECULE TYPE: DNA (probe) 



(Ml) SEQUENCE DESCRIPTION: SEQ W SOiiri: 
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TCACGCACGA AT 



( 2 ) [>TOR.VWnON FOR SEO CD NO:UJ: . 

( I )SEQirEycE CHAiuc ihaisii csr 

" ( X j lECCTIH: Ubaie pa'« 
( B ) TYPE: Dodeic add 
( C ) STRANDEDNESS: ibale 
( D ) TOPOLOGY: Unor 

( 1 I ) MOLECULE TYPE: D.VA (probe) 

( ji I ) SEQUENCE DESCRfPTTOM: SEO © N0:IJ3: 

ACCATCACCC AC 



1 2 



( 2 ) INFORMAnON FOR SEQ ID N0:U4: 

( i ) SEQUENCE CHARACTCRlSnCS: 
( A)L£NCTK:Ub»eptfi 
( B)rV?£:a«:leIe>cId 
( C ) STRANDEDNESS: tingle 
( 0 ) TOPOLOGY: Uoor 

( I I ) MOLECULE rVTErDXA (probe) 

( X 1 ) SEQUENCE DESCRimON: SEQ □) tiOilU: 

AAATAATACG ATC 

( 2 ) INFORMAnON FOR SEQ ID tiOiUSi 

( I )SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bue pain 
( B )fYPE:Bude!cKW 
( C ) STTUNDEONESS: tingle 
( D ) TOPOLOGY: Uaear 

( I I ) MOLECULE TYPE: DXA (probe) 

( X I ) SEQUENCE DESCRIPTtON: SEQ ID K0:l3S: 

GCCATAAATA AT 



( 2 ) INFORMATTON FOR SEQ ID N0:U6: 

( I ) SEQUENCE CHARACTCRtSnCS: 
( A ) LENGTH: U ba*e pain 
( B)TYFE:BoclcIcaeM 
( C ) STRANDEDNESS: sbigle 
( D)T0POLOGY:liaar 

( I i ) MOLECULE TYPE: DXA (probe) 

( X i ) SEQUENCE DESCRIPTION: SEQ CD NO:lM: 

TAGCATCCGA TA 



I 2 



. ( 2 ) INFORMAnON FOR SEQ ED N0:U7: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B )TYPE-&adcIcicU 
( C ) STRANDEDNESS: th^le 
( D ) TOPOLOGY: linear 

( I I ) MOLECUL£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID N0:U7: 

CTACCATCCO AT 



1 2 



( 2 ) WFORMAnON FOR SEQ ID N0:U8: 
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< I ) SEQUENCE CHAIUCTERISTICS: 
( A ) t£\CTK: 12 baje pain 
( B ) TYPE: Btideie »cid 
C C ) SnUM)£DNESS: jh^le 
: . . C ) TproLOGYi liacar . 

( I i )MOt£CUl£TYPE:D.VA(f»obe) 

( X I ) SEQUENCE DESCRIFTtON: SEQ ID N0:l3a: 

TTCAACCTAO OA 



1 2 



( 2 ) INFOR.\CAnoy FOR SEQ ID N0:U9: 

( 1 ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH' 13 buc pain 
( B ) TY?E: oodeie acid 
( C ) STKANDEONESS: stogie 
( D ) TOPOLOGY, traur 

( I 1 )MOLECUUTYF£:DNA (probe) 

( X i ) SEQUENCE DESCRIPTION: SEQ ED N0:U9: 

AATATTGAAC OTA 



1 3 



( 2 ) INFOR^CVnON FOR SEQ ID NaitOb 

( i ) SEQUENCE CHARACIERBTTCS: 
( A ) UENOIK: U base pain 
( B) TYPE: oudcie acid 
( C ) STRANDEDNESS: single 
( 0)TOPOLOCr: linear 

( I i ) MOLECULE TYPE: DNA (probe) . 

( X I ) SEQUENCE DESCRIPTION: SEQ ID tiOtliO: 

OCCTGTAATA TTG 



1 3 



< 2 ) INFORMAnON FOR SEQ ID N0:U1: 

( i ) SEQUENCE CHARACTEmsnCS: 
( A)LENGTH:Ubaac ptn 
( B) TYPE: oodeie acid 
( C ) STKANDEONESS: abgle 
( D)T0P0L0G1CIiaear 

( I I ) MOLECULE TYPE: DNA (f^c) 

( X I ) SEQUENCE DESCSimON: SEQ ID NO:Ul: 

TOTTCOCCTG TA 



1 3 



( 2 )INTORMAnONFORSEQIDNai4£ • 

( I ) SEQUENCE CHARA C I mU.i I CS: 
( A)l£NGrK:t3baaepaIn 
( B )TYPE: oodck add 
( C ) SnUNDEDNHSS: ain^e 
(D)TOP0L0CY:tiD<ar 

( I I )MOLECUL£TYPE:DNA(pcobe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID N0:U2: 

OTATGTTCGC CT 



1 2 



( 2 ) INFORMAnON FOR SEQ ID K0:14>. 

( I ) SEQUENCE CKARACIERISnCS: 
( A ) LENGTH: 13 baae pain 
( B)TYP£:udeicadd 
( C ) STAANDEDNESS: stDfle 
( D )T0POLOCY: tSaeu 
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( I I )MOLECUl£TYPE:DXA<probO 
^( X I ) SEQUENCE DESCRimON: SEQ CD N0:14J: 
cfcCCCTCAC TC/ . / 



.1 2 ' 



( 2 ) INTOItVlAnoy FOR SEQ 0) ^0:144: 

( I ) SEQUENCE CHAiUCTERimCS: 
( A)l£N*CTH:Ubu«FaIn 
( B)TYP£:oodckftc{d 
( C ) SntANDEONESS: ttngle 
(D)TOPOLOCY:Iiaaf 

( I I ) MOLECULE TYPE: DXACptobc) 

( X I ) SEQUENCE DESCRIPTION: SEQ CD N0:14i: 

CAGAGCTCCC CT 

( 2 ) (NFORMAnON FOR SEQ CD NO:145: 

( I ) SEQUENCE CKARACTESISnCS: 
( A ) LENGTH: 12 bsM pain 
( B ) TYPE: Dodcie add 
( C ) SniANDEDNESS: tm^c 
(D)T0FOLOGY:liaar 

( t I ) MOLECULE TYPE: D.VAOxobc) 

( x t ) SEQUENCE DESCRIPTION: SEQ ID NO: 145: 

ATOGAO'AGCT CC 



( 2 ) INFORNOOTON FOR SEQ D) Nai44; 

( 1 ) SEQUENCE CKARACTERtSTTCS: 
( A ) LENGTH: I2\ntc pain 
( B )TYPE:aBdc{eK{d 
( C ) STKANDEONESS: i'mgle 
( D)T0P0L0G1£Unear 

( I i ) MOLECULE TYPE: DNA (probe) 

( JK I ) SEQUENCE DESCRIPTION: SEQ CD tiO:l46: 

AATGCATGCA GA 



1 2 



( 2 ) DiTORMAnON FOR SEQ CD NO:147: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH- L2 bax pain 
(B)TY?E:Bod«ktdd 
( C ) SnWNDEDNESS: »iagl« 
( D)T0P0L0C1CCBcar 

< I I )MOL£CUl£TYPE:DXA(prob«) 

( « I ) SEQUENCE DESCRIPTION: SEQ CD N0:147: 

ATACCAAATC CA 



1 2 



( 2 ) INFORMAnOK FOR SEQ [D K0:I4S: 

( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH: U bas« pain 
( B}TYP£:aMtietcid 
( C ) STRANDEONESS: fb^te 
( D)T0P0L0GY^[taeai 

( I I )MOLECUl£TYP£:DHA(pobc) 

( X I ) SEQUENCE DESCRIPTION: SEQ CD NO:t48: 
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CACCAAAATA CCA 



. ( 2 ) [NTORMAnON FOR SEQ ID Nai49: " 

. . . -\" X I ) SEQUENCE CHARACIERtSnCS: . 

< A ) L£NGTK: U hu« paW 
( B) TYPE: Qodcie add 
( C ) CTRANDEDNESS: s'm^e 
( D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( « I ) SEQUENCE DESCRIPTION: SEQ © NO: 149: 

CCCAGACCAA A 



( 2 ) (NFORMXnON FOR SEQ ID NOtUOt 

< I ) SEQUENCE CKARACTERtSnCS: 
( A } LENCTTK It base pain 
( B) TYPE: aacltie Kid 
( C ) STRANDEONESS: sla^e 
( 0)TOPOLCCY:llBeu 

( I I ) MOLECULE TYPE: DNA(pfob<) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:l JO: 

TACCCCCCAG A 



( 2 ) INFORMATTON FOR SEQ ID Kai5t: 

( I ) SEQUENCE CHARACTERtSTTCS: . 
: ( A ) LENCIK II W pain 
( 8 )TYPE:aode!cicId 
< C ) STRANDEDNESS: single 
( D)T0P0LOCY:liDew 

( I I ) MOLECULE TYPE: DNA(pfob«) 

( X I ) SEQUENCE DESCRIPTION: SEQ ED N0:l5l: 



( 7 ) INFORMAnON FOR SEQ ID NO: 152: 

( I > SEQUENCE CHARACTERISnCS: 
( A ) LENOTK* 12 base pain 
( B)TYP£:aodcIe*dd 
( C ) STRANDEDNESS: single 
(D)TOFOLOCY:Unear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCRIFnON: SEQ ID N0:152: 

TCOCGTCCAT AC - . 



( 3 } DTFORMAnON FOR SEQ ID NO: 15 3: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENCTTK: 12 bas«paln 
( B)TYPE:aiKl«le »cld 
( C ) SnCANDEDNESS: ttn^e 
( D)T0POLO(7Vi linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRIPTION: SEQ ID N0:153: 

CACTATCCCC TO 



( 3 ) DTFORMAnON FOR SEQ ID NO:t54: 
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( I ) SEQUENCE CHAJlACTERESnCS: 
( A ) LENC7TH: 12 buc paira 
• (B) TY?£: oodcic »dd 
( C ) JTTRANDEDNESS: liable 
D ) TOPOLOGY: Imoj . 

( I i )MOL£CULE TYPE: D.VA (probe) 

( s I ) SEQUENCE OESCRIPnON: SEQ CD NO:lM: 

ATOACTATCG CO 



( 2 ) INFORMAnON FOR SEQ U) hOilSS: 

( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH: 12 buc pain 
( B )TY7E: ooclelc acid 
( C ) STOANDEDWESS: ib^e 
( D) TOPOLOGY: linear 

( I I )MOLEan^ TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRIPTTON: SEQ ID N0:155: 

CTCGCAATGA CT 12 



( 2 ) D»TORMAnON FOR SEQ © NftLStf: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: 12 bax: pain 
( B ) TYPE: aadete actd 
( C ) STRANDEONESS: single 
< D )TOPOLOGY: linear 

. (II)MOLECUl£ TYPE: D.VA (probe) 

■ ( X I ) SEQUENCE DESCRTPTION: SEQ W N0:l3tf : 

CCTCTCGCAA TG 12 



( 2 ) INFORMATION FOR SEQ 0) Nat57: 

( I ) SEQUENCE CHARACIERISTTCS: 
( A ) L£NGTH: 12 base pain 
( B )TYPE:Doclcic acid 
( C ) SnUNDEDNESS: ib^le 
(D)TOPOLOGY:Uiiear 

( i I ) MOLECULE TYPE: DNA (probe) 

{ X I ) SEQUENCE DESCRIPTION: SEQ ID KO:l57: 

CTCCACCCTC TC 12 



( 2 ) INFORMAnON FOR SEQ 0> N0:U8: 

. ( I > SEQUENCE CHARACTERISTICS: . 

( A ) LENGTH: 11 buc pairs 
( B)TYPE:aacl<IcM:U 
( C ) STKXNDEONESS: ahglc 
( D)TOP0LOGY:IIaew 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQU^'CE DESCRIPTION: SEQ ID NO:158: 

TCCCGCTCCA G 1 1 



( 3 ) INFORMAnON FOR SEQ CD NOrli?: 

( 1 ) SEQUENCE CHARACTERISnCS: 
( A ) L£N(nH: 11 base pain 
( 8 ) TYPE: flodele add 
( C ) STSANDEDNESS: shgl* 
( D)TOroLOGYitiiiear 
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( I I )MOLEan£ TYPE: DXA (probe) 
( X I ) SEQUENCE DESCRITITON: SEQ b NO:l59: 
CTCCtCCOCC T •: : 



( 2 ) [MFORMAnOiV FOR SEQ [D NO:Ua 

( I ) SEQUENCE CHARACTCRtSTtCS: 
( A ) LENGTH: U bu« pin 
( B )TYPE:aocleIe »cld 
( C ) SntANDEDNESS: sbgle 
( D)TOPOLOCY:Ihicaj 

( I I )MOUECULETYPE:D.VA(frobe) 

( X I ) SEQUENCE DESCSXTnON: SEQ [D NO:160: 

CACCCTCAAG TAG 



1 3 



(2) [NTORNCAnON FOR SEQ tD NOilfib 

( I ) SEQUENCE CKARACTEWSTTCS: 
( A ) LENGTH: U base pairs 
( B )TY?E: BocIcJc Kid 
( C ) SntANDEDNESS: amgle 
( 0)TOPOLOCY:Uiieir 

( I i ) MOLECULE TYPE: DNA(prob<) 

( X I ) SEQUENCE DESCRIHTON: SEQ ID N0:t61: 

TTTATCACCC TGA 



I J 



( 2 ) [NFORMAnON FOR SEQ CD NOtWi 

( I } SEQUENCE CHARACTERISTICS: 
( A ) LENGTK U bu« pain 
( B)TY7E:oadc!e kU 
( C ) STRANDEONESS: ttagte 
( D)TOP0LOG\^ liiiear 

( I I ) MOUCULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIFnON: SEQ ID N0:162: 

TTTACOCTTT ATG 



1 3 



( 3 ) CNFORMAnON FOR SEQ ID N0:16J: 

( I ) SEQUENCE CKARACTERISnCS: 
C A)LENGTH:t2b<a«pa!n 
( B )TYFE:BodcktcU 
C C ) SniANDEDNESS: tbgle 
( b )TOP0LOGY: Uaat 

( M )MOL£CULETYFE:DNA(txobe) 

< X I ) SEQUENCE DESCRITnON: SEQ (D HOilS}: 

CCTATTTAOG CT 12 



( 3 ) (NFORM/OTON FOR SEQ ID ^0:164: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) UNCTH: 13 b«M pain 
( B}TY7£:aodctcieId 
( C ) CTRANDEDNESS: sh^c 
(0)TOPOLOCY:Iiaor 

( I I } MOLECULE TYPE: DNA (probe) 



( X t ) SEQUENCE DESCRIPTION: SEQ ID HO-M: 
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TCGGCTATTT AC 



( 2 ) D;TORMAnO*V FOR SEQ n> NOilfiJ: 

- ( I isEQUENaCHARACTBlISTTCS: .'. . ' ! 
' ( A ) LEyCTK* 12 biK pin. 
( B )TYPE:aocIek scM 
( C ) SntANDEDNESS: tingle 
( D )TOPOLOCY:Unear 

( I I ) MOLECULE TY?E:D.yA (probe) 

( X I ) SEQUENCE DESCRIFTIOy: SEQ ID NO: 165: 

ACGTCTCCGC TA 



( 2 ) INFORMXnON FOR SEO D N0:W6: 

C I ) SEQUENCE CHAHACTERISnCS: 
( A ) LENGTH: 12 bue pain 
( B ) TYPE: aodek »cld 
( C ) SnUNDEDNESS: liable 
( D ) TOPOLOGY: Imev 

( I I ) MOLECULE TYPE: ONACpcobc) 

( X I ) SEQUENCE DESCRimON: SEQ ID N0:166: 

ACGGCAACCT CT 



( 2 ) INFORMAnON FOR SEQ (D NO:167: 

( . I ) SEQUENCE CKARACIERBTTCS: 
' ( A ) LENGTH: 12 bw pain 
( B } TYPE: aodcic acid . 
( C ) SniANDEONESS: ilDgle 
( D )TOPOLOGY^ linear 

( I I ) MOLECULE TYPS D.VACfwobe) 

( X I ) SEQUENCE DESCRIP7T0N: SEQ ID N0:167: 

TTTAAOOOGA AC 



( 2 ) INFORMAJTON FOR SEQ ID NaWS: 

( I ) SEQUENCE CHARACTERtSnCS: 
( A ) LENGTH: UbiK pain 
( B ) TYPE- ttodctc Ktd 
( C ) SnUNDEDNESS: »b^e 
( D ) TOPOLOGY. Uneai 

( I I ) MOLECULE TYPE: DXACpcobf) 

( X I ) SEQUENCE DESCRIFnON: SEQ ID N0:168: 

ATdTCTTATT TAAC 



( 2 ) INFORMAnOH FOR SEQ ID N0:ltf9: 

( I ) SEQUENCE CHARACIERiynCS: 
( A ) LENGTH: U base pain 
( B )TYPE:aactcIc kU 
( C ) SriLCfDEDNESS: sbgle 
( D ) TOPOLOGY Ihear 

( I I ) MOLECULE TYPE: DXA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ CD KO:169: 

CATCCTGATC TCT 



( 2 )INH)RMAnONFORSEQIDN0:l7a 
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( I ) SEQUENCE CHARACTEROTCS: 
( A ) LENCnt 12 b«jc pain 
( B)TYF£:oDcleic>ctd 
( C ) SniANDEDNESS: sialic 
(D) TOPOLOGY:. lao/ 

( I I /molecule TYPE: D.VACpr<ibc) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:i;0: 

TCCATCCTOA TO 
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( 2 ) INFORMAnON FOR SEQ (D N0:L7L: 

( i ) SEQUENCE CHARACTERtSnCS: 
( A ) LENCTK: 12 bix pain 
( B )TYPE:B«leie »cld 
< C ) SnUNDEDNESS: .bgle 
( D)TOroLOCYi linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( Ji I ) SEQUENCE DESCRI7TT0N: SEQ ZD NO:!?!: 

GATOATCCAT CO 



( 2 ) DTFOR-VtAnON FOR SEQ ID N0:172: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LE.VGTK: 13 ba^epain 
( B )TYPE:aQdc!e acid 
( C ) SniANDEONESS: ahi^e 
( D ) TOPOLOGY: linear . 

( I i )M6LECULETYP£:b.VA0)robe) 

( K I ) SEQUENCE DESCRIPTION: SEQ ID NO:172: 

AGACCTCATO ATC 13 



( 2 ) INFORMAnON FOR SEQ ID NO:!73: 

( i ) SEQUENCE CHARACTERtSnCS: 
( A ) LENGTH: U base pain 
< B )TYPE: BodeicKld 
( C ) STRANDEDNESS: ain^le 
( D) TOPOLOGY: Uneai 

( I I ) MOLECULE TYPE: DNA (probe) 

( K I ) SEQUENCE DESCRIPTION: SEQ ED NO:!73: 

CCGTGATACA CCT 13 



( 2 ) INFORMAnON FOR SEQ ID N0:a4: 

( I )SEQUENCECKARACTERISnCS: 
(A)LE^'(;n{:UbaacpaIn 
( B ) TYPE: atKiek acid 
( C ) STRANDEDNESS: tbgle 
( D) TOPOLOGY: Imev 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRDTION: SEQ (D NO:t74: 

ATACGCTGAT ACA I 3 



( 2 ) INFORMAHON FOR SEQ ID N0:175: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: 12 bajc pain 
( B ) TYPE: i«lcie acid 
( C ) STRANDEDNESS: sfagle 
( D )T0P0L0GYi Uoor 
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( I I )MOlXaaETYPE:D.VA(pcob«) 
. ( X i ) SEQUENCE DESCRIFnON: SEQ ID ttOillS: ... 

;togttaatag.:Cc. 



( 2 ) INFORMAnaV FOR SEQ ID N0:t7d: 

( I ) SEQUENCE CHARACTERtSnCS: 
(A)LENCTK:Ubaj«p3ln 
( B )TY7E:BQd<Ie*cId 
( C ) STRANDEONESS: tbglc 
(D)TOroLOGY:rm»r 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCK£?TION: SEQ ID NO:176: 

CTOACTOCTT AAT 



( 7 ) INFORMAITO.V FOR SEQ CD NO:in: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENCTK- 12 base pain 
( B)TYPE:Dod<{c»eM 
( C ) STRANDEDNESS: sh^e 
( D}T0P0LOCY^ ihcu 

( I I ) MOLECULE TYPE: D.VA (probe) 

{ X I ) SEQUENCE DESCRIFTiaV: SEQ ID N0:l77: 

TGTGCCOCAT AT 



( 2 ) DfFORMAnON FOR SEQ ID NftlTS: 

( I ) SEQUENCE CKARACIERtSnCS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: ftBdcic Kid 
( C ) STRANDEDNESS: sEogle 
(D)TOPOLOCYiU[ieir 

( i I )M01£CUl£ TYPE: DNA (probe) 

( K I ) SEQUENCE DESCRIPTION: SEQ ID N0:178; 

ACTCTTOTCC GG 



( 2 ) DiTORMAITON FOR SEQ ID KO:t79: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: U b«M pain 
( B )TYPE: udcktcld 
( C ) STRANDEDNESS: im^Ie 
( D)TOPOLOC\^lkar 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTTON: SEQ ID NO:X79: 

TAOCACTCTT GTG 



( 2 ) INFORMAnaV FOR SEQ ID K0:tsa 

( I ) SEQUENCE CHARACTERISTTCS: 
( A ) LENGTtt U base pain 
( B)TYPE:BadcIc»dd 
( C ) STTtANDEDNESS: sIb^« 
< D)T0P0L0C1C(mcar 

( I I )M0L£CUI£ TYPE* DNA (probe) 



( M I ) SEQUENCE DESCRIPTION: SEQ a>N0:tSQ: 
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CCACACTACC ACT 



(2) I^^FOR^c«ffTo^f for seo cd koisi: 

( A ) L£KCTk: 12 bue (Mtn 
( B ) TYPE: □Dcleie >dd 
( C ) STTWNDEDNESS: ife^e 
( 0)TOPOLOCY:Uiieir 

( I I )MOLEanJETY?E:D.VA (probe) 

( « I ) SEQUENCE DESCRfPTTON: SEO tD *V0:l8l: 

CCCACCACAC TA 



( 2 ) INFORMAnON FOR SEQ 10 N0:l8i 

( I ) SEQUENCE CHARACTERlSnCS: 
< A ) LEN'CTIH: U base ptn 
( B)TVTE:noeIeic»eId 
( C ) Sm\ND£DNES5: tliigle 
( D)T0POLOGV:IIaear 

( ! I ) MOLECULE TYPE: DXACF»obe) 

( « I ) SEQUENCE DESCRIPTTO.V: SEQ CD N0:l«2: 

CCOACCCAGG A 



( 2 ) INFORMATIO.V FOR SEQ ro NO: 183: 

( I ) SEQUENCE CHARACTERtSnCS: 
( A ) LENGTH: 10 biM pain . 
( B ) TYPE* aocleic Kid 
( C ) SnUNT»EONESS; *bgle 
(D)TOPOLOGV:Haar 

( I I )MOL£CUl£TYTE:DNA (probe) 

C « I ) SEQUENCE DESCRIPTION: SEQ [D N0:18J: 

CGCCCGGAGC 



( 2 ) tNFORMXnON FOR SEQ [D NttW: 

( I ) SEQUENCE CKARACIERtSnCS: 
( A ) LENGTH: 11 bu< pain 
( B)TYPE:aocIeIc acM 
( C ) STRANDEDNESS: sbgte 
( D)TOPOLOGTftUii«/ 

( I I )MOLECUl£TY1»E:DNACpro6«) 

( * .1 ) SEQUENCE DESCRIPTTON: SEQ ID NO:IW: 

TTATOGGCCC G 



( 2 ) INFORMAnON FOR SEQ ID NO:l8S: 

( I ) SEQUENCE CKARACIERISnCS: 
( A ) LENGTH: 12 hue pain 
( B ) TYPE: oodcic Kid 
( C ) STRANDEDNESS: *Ia^e 
( D) TOPOLOGY^ Ubot 

( I I )MOLECUl£TYPE:DNA(frobe) 

( X I ) SEQUENCE DESCRIPTION: SEQ Q) N0:185: 

ACTCTTATCC GC 



( 2 )INFOR>(AnONFORSEQa>KO:tB5: 
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( I )SEOUENCECKARACIERlSnCS: 
(A)tENGTK:12t>uepatn 
( B)TY7£:otiddc»cid 

( C)SnUM)EDNESS:*haIe . - ' 
( D )TOPOLOCY: Uaev 

( i I )MOt£a;iiTYPE:aVA(profec) 

( X I ) SEQUDfCE DESCRIPTION: SEQ ID NO:iaS: 

TACCCCCAAC TO 



( 2 ) DtTORMAnON FOR SEO ID NO:t87: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LEXCTH: 12 baj« pain 
( B)TYPE: aadelcidd 
{ C ) SIRANDEDNESS: »h^« 
(D)TDPOlOCY:lia«r 

( I I )MOt£CUl£ TYPE: DXA (probe) 

( a I ) SEQUENCE DESCRIPTION: SEQ ID NO:lS7: 

TTTAGCTACC CC 



( 2 ) INFORMAnON FOR SEQ ID N0:18& 

( I ) SEQUENCE OURACTERISnCS: 
( A ) LENC7TK: 13 baic pain 
( B )TY7£:oacteie teid 
{ C )SniAND£DNESS: im^c 
( D)TOPOLOCY: linear 

( i i ) MOLECULE TYPE: DNACpbbe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NOilSS: 

TTCACTTTAG CTA 



( 2 ) INFORACAnON FOR SEQ ID N0:189: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) lENOTH: U biM pain 
( B ) TYPE: Dodcte »etd 
( C ) STRANDEONESS: sb^e 
(D)TOPOLOCY: linear 

( I i ) MOUECUIE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID KO:189: 

TACAOTTCAC TTT 



( 2 ) INFOR>(AnON FOR SEQ ID NO:l90t 

( I )SEQUENCECHARAaERtSnCS: 
( A ) LENGTH: U biic pain 
( B)TYP£: aodck 
( C ) CTRA^DEO^fESS: slagte 
(D)T0POLOCY:tmcar 

( I I )MOLECUl£TYPE:DXA(prob«) 

( X I ) SEQUENCE OESCRIPTION: SEQ ID NO:190: 

TCCAGATACA GTT 



( 2 ) INFORACAnON FOR SEQ ID N0:19L* 

( I ) SEQUENCE CHARACIERCTICS: 
( A)LEKGn<:UbajepaIn 
( B)TYPE:«Dd<{eK{d 
( C ) HRANDEDNESS: sh^ 
(D)T0FOLOG%tbear 
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( I I ) MOLECULE TYPE: D.VA (probe) 
. ( X I ) SEQUENCE OESqmtON: SEO ED Kp:l9l: 
. : CACATCtCGA OAT^^^: . _ 



( 2 ) (WORMAnOM FOR SEQ ID SO-.tS^ 

( I )SEQUENaECKAJlACTER£ynCS: 
( A)t£>-CTH:t2baK(ntn 
( B)TY?£:aoctetc»dd 
( C ) SniANDEDNESS: sh^c 
(O)TOroLOGY: (mcar 

( I i )MOL£CULETYPE:D.VA0)robe) 

( * I )S£QUENCEDESCRIFnON;.SEQ IDN0:192: 

ACCAACCACA TG 



( 7 ) INFORAUnoy FOR SEQ ID SO-.lSh 

( I ) SEQUENCE CHARACIERmiCS: 
( A ) l£NCTK: U bai« pain 
( B )TYPE:DadeIeicUl 
( C ) SntANDEONESS: t laglc 
( D)TOPOLOGY:Ii[»ar 

( I I )MOL£CULE TYPE: D.VA (pcobe) 

( K I ) SEQUENCE DESCBIPTTON: SEQ ID N0:193: 

CAACTACOAA CCA. 



( 2 ) INFORMAnON FOR SEQ ID N0:I9*: 

( I )SEQUENCTaWRACTERISnCS: 
( A ) LENCTK: U bue pain 
( B)TY7E niKteIc»eld 
( C ) STSANDEDNESS: ih^Ic 
(D)T0POL0Cn£tbi£u 

( I I )MOL£CULETYFE:DNA (probe) 

( X I ) SEQUENCE DESCRIFnON: SEQ ID N0:194: 

OACTCTAATC TOO 



( 2 ) INFORMAnON FOR SEQ ID NO:195: 

( I ) SEQUENCE CKARACTERISnCS: 
( A)L£NCTK:.Ub»«epatn 
( B)TY7E:Bactclctdd 
( C ) STRANDEONESS: »In4le 
(D)tOPOLOCTftliiwr 

. ( I I )MOLECUL£TlfFE:DNA(ptobe) 

( » I ) SEQUENCE DESCRITnON: SEQ W N0:195: 

OOGATTTCAC TOT 



( 2 ) INFORMAnON FOR SEQ ID NO:196: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: U buc pain 
( B)TYPE:sDdcieK{d 
( C ) STRANDEONESS: sb^te 
(D)T0P0LOG)&nnev 

( I I ) MOLECULE TYPE: 0*CA (probe) 

(Ml) SEQUENCE DESCMPITON: SEQ CD N0:196: 
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AOCGATTTGA CT 



• * ( 2 ) DiTORMAlTON FOR SEQ ID n6:197: - 

, . ; (. i ) SEQUENCE dURACTERtsnCS: 
( A ) LENGTH: 12 bai« juln 
< B ) TYPE: aoclek acid 
( C ) SnUNDEDNESS: ihgle 
( D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DXA (probe) 

( X i ) SEQUENCE DESCRIPTION: SEQ tDNO:l97: 

ACCAGAAGCG AT 



( 2 ) INFORMAnON FOR SEQ © NOtWB: 

( I ) SEQUENCE CHARACIERtSnCS: 
( A ) LENGTH: 12 base piin 
( B ) TYPE: aodctc acid 
( C ) SntANDEDNESS: slo^e 
(D)TOPOLOGY:Uneir 

( I I ) MOLECULE TYPE: DNA(prob«) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:198: 

TCCCCACGAC AA 



( 2 ) INFORMATION FOR SEQ ID NO: 199: 

(1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENCTK* 12 baie pain 
< B ) TYPE: OQcIcIe acid 
( C ) STRANDEDNESS: alo^Ie 
( D)TOP0LOCY: linear 

( I I )MOL£CUl£ TYPE: DNA (probe) 

( X I ) SEQU&fCE DESCRIPTION: SEQ ID N0:199: 

ATCCATGGOG AC 



( 2 ) INFORMATION FOR SEQ ID Na2DCt 

( I ) SEQUENCE CHARACIERISnCS: 
( A ) LENGTH: 12 bajc pain 
( B) TYPE- oodtle add 
( C ) STRANDEDNESS: sia^e 
( D) TOPOLOGY linear 

( I I )MOL£CUl£TYPE:DNA(frobt) 

( X I ) SEQUENCE DESqUPTION: SEQ ID N0:2Cp: 

CGTCATCCAT GO 



( 2 ) INFORMAnON.FOR SEQ D NO-JOl: 

( I )SEQUENCECHARAaER£SnCS: 
( A ) LENGTH: U bus pain 
( B)TYPE:aodcIcacId 
( C ) STRANDEDNESS: sb^Ie 
( D)T0P0LOGY^ linear 

(II) MOLECULE TYPE: DXA (probe) 

( X t ) SEQUENCE DESCRIPTION: SEQ CD NO-^1: 

AGCCCGGTCA T 



( 2 ) INFORMAnON FOR SEQ ID NO-^ 
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( I )SEOUE>fCECHARACTCRISnCS: 
( A ) LENGTH: 12 btjc pitn 
( B ) TYPE: sodeie acid 
( C ) STKANDEDNESS: jh^Ic 
. (DJTOPOLOOYMbcw.. 

( I I )MOL£CULETYP£:DyA(icobe) 

( X I ) SEQUENCE DESCRfPTION: SEQ © NO-^- 

TATCTGACCC CC 



( 2 ) INFORMAnON FOR SEQ m NO:20J: 

( I ) SEQUENCE CHAHACTERISnCS: 
( A ) LENGTH: 12 bajc pain 
( B ) TY?& Qsdele scU 
( C ) STKANDE0NHS5: »male 
( D)TOPOLOCY: linear 

( I I ) MOLECULE TYPE: D.VA(prob«) 

( » I ) SEQUENCE DESCRtPTTON: SEQ ID NO-^Q3; 

ACCCCTATCT OA 



( 2 ) D^FORMA^O^f for SEQ n> NO-JW: 

( I ) SEQUENCE CKARACTERlSnCS: 
( A ) LENGTH: tl true pain 
( B ) TYPE: aocleie acid 
( C ) STRANDEONESS: tingle 
( D) TOPOLOGY: linear 

(I i )MOl£CULETYPE:DNA<pcobe) 

( X I ) SEQUENCE DESCRTPTtON: SEQ ID NO-io*: 

ACGCACCCCT A 



( 2 ) INFOR.VCAnON FOR SEQ ID NO:»35: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bu« pain 
( B }r\rPE:ondcIcac{d 
( C ) STRANDEDNESS: single 
( D )TOraLOCY: Vtaeu 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPKON: SEQ ID NO-J05: 

TCGTCAACCC AC 12 



( 2 ) INFORMAnON FOR SEQ ID NO:206: 

( I ) SEQUENCE CHARACTERISTICS: 
( A } LENGTH: 12 baae pain 
( B ) TYPE: aoclctc acid 
( C ) STRANDEDNESS: ibgle 
( D) TOPOLOGY: Vmeu 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-J06: 

COATCGTCGT CA 1 2 



( 2 ) INFORMAHON FOR SEQ ID NO-^7: 

( t )SEQUENGBCKARACTERISnCS: 
( A ) LENGTH- 12 baae pain 
( B ) TYPE: aodck add 
( C ) SntANDEDNESS: sh^lc 
( D)TOPOLOGY^UBar 
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( I I )MOLEaa£ TYPE: DXA (probe) 
( X i ) SEQUENCE DESCRrPTXOX: SEQ tD\Ch207: 
AGGATCCTCO' :TC '■• ' ^ ■ - 

( 2 ) INFORMAITOX FOR SEQ ID NO-J08: 

( I ) SEQUENCE CHASUCTERtSnCS: 
( A ) LENCTK- 12 bise pain 
( B)TY?£:Bcdc!cKld 
( C ) STOANDEDNESS: aln^e 
( D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( X I ) SEQUENCE OESCRUTION: SEQ ID NO-^- 

ACACCCACCA TO 

1 2 

( 2 ) INFORMAnON FOR SEQ (D NO-^ 

( I ) SEQUENCE CKARACIBUSTTCS: 
( A ) UENCTK: 12 base pain 
( B ) TYPE: ftodtic add 
( C ) STRANDEDNESS: «m^e 
(D)T0P0L0<7Y:Imeir 

( I I ) MOLECULE TYPE: DNA (probe) 

( « I ) SEQl/ENCE DESCWmON: SEQ ID NO:209: 

TCATTTACAC CO . . 

• .'■ • • - . ■ ■ ■■ ■■".■'.''•.*'" 

( 2 ) DCFORMAITON FOR SEQ ID N0:2ia 

( I ) SEQUENCE CHARACTCRISnCS: 
( A ) lENOnt U b«c pain 
( B)TYPE:ondfIc tcld 
( C ) SnUNTEDNESS: aln^e 
( D)TOP0LOCY;lia«if 

( I I ) MOLECULE TlfPE: DNA (probe) 

( * i ) SEQUENCE DESCRrmON: SEQ ID NO J 10: 

GCCATATTGA TTT 



( 2 ) DiTORMAnON FOR SEQ CD NO-^ll: 

( I ) SEQUENCE CHARACIERISnCS: 
( A)L£NGTK:12 baMpaln 
( B )TY7E:aode[eicU 
( C ) STRANDEDNES& stagte . 
(D)T0POL6CY;ifaew 

( I I ) MOLECULE TYPE: O.VA (probe) 

( » I ) SEQUENCE DESCRIPnON: SEQ CD N0:2ll: 

CTGCCATTTO OA 



( 2 ) INFORMATION FOR SEQ ID NO-Jli 

( I ) SEQUENCE CKARACIEROTTCS: 
( A ) LENGTH: U bue pain 
( B)TYPE:aaclck*cJd 
( C ) STRANDEONESS: «hjtc 
(D)TOP0LOC%Ihetf 

( I I ) MOLECULE TYPE: DNA (probe) 



( « I ) SEQUENCE DESCRIPTION: SEQ CD N0-J12: 
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ACCCCTCGCA T 

( 2 ) INFpRAWITaV FOR SEQ ID NO-JU: 

. . ( I j SEQUENCE CKARACTCROTCS: . 

( B } TYPE: oedetc icid 

( C ) SnUNDEOyESS: jlnale 

< D)T0PbL0GY:lintt7 

( I I ) MOLECULE TYPE: DXACpobe) 

( K I ) SEQUENCE DESCRTPTION: SEQ ID NO-JU: 

CCTCACCCCT C 



( 2 ) I>fFORMAnO*V FOR SEQ ID N0:2l*: 

( I ) SEQUENCE CKARACTCRISTTCS: 
( A ) LENGTH: t2bu« pain 
( B ) TYPE: QQcIstc *a*d 
( C ) STRANDEDNESS: slagjc 
( D)TOPOLOCV:Uaa/ 

( I I ) MOLECULE TYPE: D.VA(pdb«) 

( X i ) SEQUENCE DESCRIPTION: SEQ Q) N0-JI4: 

ACTCCGTCAC CG 



1 2 



{ 2 ) INFORMATON FOR SEQ ID NttZtS: 

( i ) SEQUENCE CHARAOEROTCS: 
( A ) LENGTH: U bx*e pi'tn 
( B ) TYPE: Btidcic Kid. 
{ C ) SnUNDED.N-ESS: »b5le 
{ D ) TOPOLOGY: Uaear 

( 1 I ) MOLECULE TYPE: DNA(pobe) 

( X i ) SEQUENCE DESCRIPTION: SEQ U> HOHlS: 

CTATCCTACT GOG 

( 2 ) INFORMATION FOR SEQ ID N0:216: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: Uba«e pain 
( B )TYPE-Biid<kKU 
( C } STKANDED^NESS: sb^e 
(D)TOPOLOGY:Iiaar 

( I I ) MOLECULE TYPE: DNA(pco6«) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-^IS: 

TTTOTTGGTA TCC 



( 2 ) INFORMAnON FOR SEQ tD NO-^17: 

( t ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: UbiM pain 
( B )TY7E:&ade{c *eld 
( C ) SnUNDEONESS: sh^c 
(D)T0POL0GY:nB£M 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPnON: SEQ ED NO-JU: 

CTACCTTTCT TCC 



( 3 ) INFORMAnON FOR SEQ U> KO-JIS: 
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( I ) SEQUENCE CKARACTEROTICS: 
( A ) LENGTH: 12 pain 
( B)TY7E:Daetck*cId 
'( C ) STRANDEDNESS: *m£t'e 
. ( D )T0POLOGY:Uacar .. , ' / , -' 

( M ) MOl£CUl£7YPE: bNA Cprob«) • 

( X I ) SEQUENCE DESCRIPTION: SEQ ID UOaiZ: 

TGCOTACOTT TC 12 



( 2 ) [NFORMAnOX FOR SEQ ID NO-J19: 

( 1 ) SEQUENCE CKARACIERlSnCS: 
( A } LENCTK: 12 biM pain 
( B)TY7E:aBclcU tcUS 
( C ) STRANDEONESS: stn^e 
(D)Tl)POLOCY:l{near 

( 1 I ) MOLECULE TYPE: DXA (probe) 

( X I ) SEQUENCE DESCRTPTTON: SEQ ID NO:219: 

TAAGOGTCGG TA 12 



( 2 ) INFORMAnON FOR SEQ ID NO:22a 

( I ) SEQUENCE OWUCIERISnCS: 
( A ) lENCTTH: U bi5C pain 
( B}TYPE:atide!c xld 
( C ) STRANDEONESS: sbifte 
( 0)TOroLOCY:Imejr 

.( I I ) MOLECULE TYPE: DNAOicobc) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID ttOzl^k 

GTACTGTTAA GGG 



( 2 ) INFORMATION FOR SEQ ID NO-^U 

( I ) SEQUENCE CKARACIERISnCS: 
( A ) LENGTH: 14 bsM pain 
( B)TYPE:ood«c*da 
( C ) STRANDEONESS: sbgle 
(D)T0roLOGY:rmcv 

( I I )MOLEa;t£TYPE:DNA(prob«) 

< X I ) SEQUENCE DESCRIPnON: SEQ ID N0-J31: 

TGTACTATGT ACTG 



( 2 ) (NFORMAnaV FOR SEQ ID KO-^ 

( I )SEQUENGECKARACTERISTTCS: 
( A ) LENGTH: U bue pa2n 
( B)TYPE:BiicIek>cld 
( C ) STRANDEONESS: ib^Ic 
(D)TOPOLOGI^Ima/ 

(II) MOIECUIE TYPE: ONA (pobc) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID HOa22: 

GCCTTTATGT ACT 13 



( 2 ) [NFORMAnONFOR SEQ ID KO-Mh 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH- 12 baM pain 
( B)TYPE:Bodck*cU 
( C ) STRANDEONESS: sh^e 
(O)TOPOLOGY: linear 
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( I I.) MOLECULE TYPE: D.VA<frob«) 
( )i i ) SEQUENCE DESCRIPTION: SEQ ID N0*^2J: 
AAATOGCTTT.AT : 



( 2 ) IN?OR^CAn0^f for SEQ to HO:7H: 

( I )SEQUEN-CICHAftACIERCSnCS: 
( A ) LENOTK: 12 bue pain 
( B )TYPE oodeIcKld 
( C )STEtANDEONESS: «mjte 
( D )T0roLOCY: Uaev 

( 1 i ) MOLECUl£ TYPE: DXA (probe) 

( z I ) SEQUENCE DESaUFHON: SEQ ID HOOZ*: 

CCTAAATGCC TT 



( 2 ) [NFORMAnON FOR SEQ ID NO*^ 

( I ) SEQUENCE CHARACIERISTICS: 
(A)l£NGTH:UbaMj>aIn 
( B ) TYPE: aoelcic ac!d 
( C ) SnUNDEDNESS: im^« 
(D)T0POLOCY:Iiiicir 

( i .1 ) MOLECULE TYPE: DNA (probe) 

( » I ) SEQUENCE DESCRIPTION: SEQ ID NOiHJ: 

TCTACCCTAA ATC . 13 

( 2 ) tNTORNOOTON FOR SEQ ID N0-^«: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: U bu« pain 
(B)TYPE:oocIek kM 
( C ) SntLNDEDNESS: »b jte 
(D)TOPOLOGY:liBar 

( I I )MOLEaa£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-J26: 

CTCCTAATGT ACC I 3 



( 2 ) INFORMAnON FOR SEQ ID KO:227: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: U b«i« pain 
( B )TY?E:BocIcIetcId 
( C ) SnUNDEONESS: MD^e 
( 0 ).T0P0LOGY: llaear 

( I I )MOL£CULETYPeDXA(probe) 

( X I ) SEQUENCE DESCRIPTTON: SEQ ID NO-^: 

TAATGTOCTA ATC 13 



( 2 ) INFORMAnON FOR SEQ ID NO-^ 

( I ) SEQUENCE CKARACTERBTICS: 
( A ) LENGTH: U bu« pain 
( B)TYP£:Bodffle»eid 
( C ) STRAM?EONESS: jl« 
( D)TOraLOG)CQnear 

( i I ) MOLECULE TYPE: DNA (probe) 



( X I ) SEQUENCE DESOUFHON: SEQ ID NO:32S: 
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CATCCCCACC C 

< 2. ) INFORMAJTpX FOR SEO ID 

( ;i ) SEQUENCE CHARACreRISTTCS; . - ■ . 

■ ( A)LENOTK:l2bii«patii ■ 

( B )TYP£:oiKleicacUl 
( C ) ST1UM)EDNESS: single 
( D)TOPOLOGV;iinar 

( I I ) MOLEOOE TYPE* D.VA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEO © NO-J29: 

TCTAACCATG GO 



( 2 ) DfFORMAnON FOR SEO ID NO:23a 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENCTTK: 13 baie patn 
( B ) TYPE: oodcte scid 
( C ) STRANDEDNESS: sln^e 
(O)TOPOLOOY. Ikear 

( I I )M0L£CU1JE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-.230: 

TTCCTTCTAA CCA 13 



( 2 ) INFORMAnON FOR SEQ ID NO:23l: 

. ( I )SEQUENCE CHARACTERISTICS: , 
' ( A )LENCTH:.Ub«epairi 
- ( B ) TYPE: oadete 
( C ) SntANDEDNESS: ibjle . 
( D )TOPOI.OCY: Unear 

( I I ) MOLECUIE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ IDN0-J31: 

TCTACTTGCT TOT IJ 



( 2 ) INFORMAnON FOR SEQ ID N0:232: 

( I ) SEQUENCE CHARACTERtSnCS: 
( A ) LENGTH: U baM pain 
( B )TYP£: aoetclc add 
( C ) STRANDEDNESS: shfle 
( D) TOPOLOGY: Unev 

( I I )MOl£CUl£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRfPTION: SEQ ID HOaih 

TTGCTGTACT TGC . 13 



( 2 ) INFORMAnON FOR SEQ ED NO-^3: 

( I ) SEQUENCE CKARACTERUnCS: 
(A)l£NGTH:l2bas« pin 
( B)TYP£:DoeiacteU 
( C ) SnUNDEDNESS: single 
( D)T0FOL0C%Usear 

( I 1 ) MOLECUli TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NOa33: 

CCTTGATTCC TC 12 



( 2 )INFORMAn0NF0RSEQ©NO-J3* 
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( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LEXGTH: 12 buc pain 
( B ) TYPE: Dodcic ftctd 
. ( C ) snUNDECNESS: iingle 
< 0)TO(OLOCY:liiiear : . . - 

( i i ) MOLECULE TY?E:DNA(pfobe) 

( X I ) SEQUENCE DESCRIPTTOX; SEQ 03 N0:2«: 

TTOACCGTTO AT 



( 2 ) [^TOR^(A^o^f for seo m no-^5: 

( I ) SEQUENCE CKARACIERtSTTCS; 

( A ) LENGTH: U bM pain 
( B)TY7£:siideic>cId 
( C ) yiRANDEDNESS: ih^e 
( 0)TOPOLOGY:IIncjr 

( I I )MOt£CUl£TYre:DyA (probe) 

( z I ) SEQUENCE DESCRIPTION: SEQ ID N0.235: 

CTCATACTTG AGO 



( 2 ) INFORMAnoy FOR SEQ ID NO-^«: 

( I ) SEQUENCE CKARACIERtSnCS: 
( A ) iSyCtH: U bajc pain 
( B}r)rPE:nade!c»cU 
( C ) STKANDEDNESS: lin^Ie 
( D) TOPOLOCY:liodar 

. ( M ) MOLECULE TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-JJtf: 

TTCATCTGTC ATA 



( 2 ) CSTORMAnON FOR SEQ ID N0JJ7: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: U base pain 
( B )TYPE: nodck scEd 
( C ) SniANDEDNTSS: ib^e 
(D)TOFOLOCY^Qitear 

( I I ) MOLECUU TYPE: aVA (probe) 

( X I ) SEQUENCE DESCROTtON: SEQ ID N0*^7: 

TGCACTTCAT CTC 



(2) INFORMAnON FOR SEQ ID NO-^8: 

( I ) SEQUENCE CHARACIERtSnCS: 
( A)LENCnt:Ub*MpaIn 
( B)TYPE:Dadcic»cxJ 
( C ) SnUNDEDNESS: ibgic 
( D)TOPOLOC%Imor 

( t I ) MOLECULE TVTE:DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NOOJS: 

TOCAGTTGCA CT 



( 2 ) DfFOR.%CAnON FOR SEQ ID ffO-^9: 

( 1 ) SEQUENCE CKARACTERISnCS: 
( A)LENCm{: I2baa«paln 
( B) TYPE: sadoE add 
( C ) STRANDEDNESS: shgle 
<D)T0P0L0C1£Daeu 
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( I I )MOt£a;LE TYPE: DNA (probe) 
( « I ) SEQUENCE DESCRIP7TOfliSEQ[DNCh239: 
ATTTGCAGTT CC / : 



( 2 ) INFOR\CAnON FOR SEQ [D NO:24Gt 

( I ) SEQUENCE CHARACIHRISnCS: 
. ( A ) LENGTH: U bate pain 
( B )TYP£:BadeIeMfd 
( C ) SntANDEONESS: sb^e 
(D)T0P0L0CY:Iiae3f 

( i I ) MOLECULE TYPE: DXA (probe) 

( X I ) SEQUENCE DESCRIPTtON: SEQ tDNO-J40: 

TACCOTACAA TAT 



( 2 ) INFOtLNOnON FOR SEQ ID SO-^iU 

( 1 ) SEQUENCE CKAJUCTERISnCS: 
( A ) LENGTH: U bis« pain 
( B)TYPE:aadclcscId 
( C ) SntANDEONESS: sia^e 
(D)TOPOLOGY: linear 

( I i ) MOLECULE TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCRIP7T0N: SEQ tD NO:241: 

TGCTACCGTA CAA . 



( 2 ) [^^FOHL^U^O^^ FOR SEQ ID n6:242: 

( I ) SEQUENCE CHAHACIERISnCS: 
( A ) LENGTH: U baie pain 
( B )TYP£:oQdeIc sctd 
( C ) STTWNDEDNESS: sb^e 
(D)TOPOLOGY:liaeir 

( I I ) MOLECULE TYPE: DNA (probe) 

( » I ) SEQUENCE DESCRTPTTON: SEQ ID NO:242: 

TATTTATCGT ACC 



( 2 ) INFORMATION FOR SEQ Q) NO-J43: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) t£NGTH: U btMpain 
( B)TYPE:aacte{c»cU 
( C ) STTIANDEDNESS: iiagl« . 
( D)T0P0L0Gy:li8« 

. (II ) MOLECULE TYPE: D.VA (prob«) 

( X I ) SEQUENCE DESCRXPTtON: SEQ U> 

CGTCAACTAT TTA 



( 3 ) DfFORMAnON FOR SEQ [D KO-J44: 

( I ) SEQUENCE CHARACTEROTICS: 
( A ) LENGTH: U bu« pain 
( B)TYP£:o9cUIe>dd 
( C ) STRANDEDNESS: tm^t 
(D)TOPOLOCYiliiMr 

( I I ) MOt£CUl£ TYPE: DNA (probe) 



( X I ) SEQUENCE DESCRIPTION: SEQ ID NOJil: 
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TACACCTGGT CAA 

( 2 ) INFOELVCATlOXfOa SEQ CD KO:245: 

■ ^ ( I ) SEQUENCE OUStACTERtSnCS; ^ ^ • 

■ ( A)LE\GTK: Ub^paln • ••V. 

( B ) TYPE: asdde add 
( C ) SnUNDEONESS: tm^e 
(D)TOF0L0<7K linear 

( I I ) MOLECULE TYPE: aVA (probe) 



( a I ) SEQUENCE DESCRIPITON: SEQ D) NO-J45: 



ATGTACTACA OCT 

( 2 ) INFORMAnOK FOR SEQ © tiOM, 



( I ) SEQUENCE CHARACIERXSnCS: 
( A ) LENGTH: IJ bue pain 
( B)TY7£:aade>cBckl 
( C ) STRAVDEDNESS: sialic 
(D)TOPOLOCYiliBar 

( I I )MOLECUl£TYPE:DNA(frobO 

( X i ) SEQUENCE DESCRimON: SEQ ID h'ChliS: 

CGTTTTTATC TAC IJ 



( 2 ) INFORMAnoy FOR SEQ ID Na247: 



( i ) SEQUENCE CKAIUCTERCSnCS: . 

( A ) LENGTH: 12 bai« pair* . 
( B )TY?E: sadctc acid 
( C ) StRANDEONESS: ttngtc 
( D ) TOPOLOGY: Uaeax 

( I I ) MOLECULE TYPE: DXA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:247: 



( 2 ) ENFORMAnON FOR SEQ ID NO:24& 

( I )SEQUENaCHAaACIER£SnCS: 
( A ) LENGTK 12 bue pa!n 
( B)TYF£:oQdcieidd 
( C ) STKANDEDNESS: tbgtc 
(O)TOPOLOQfitlaear 

( I I ) MOLECULE TYPE: DXACprobf) 

( » I ) SEQUENXE DESCWPTTON: SEQ ID NO:24S: 

TCTACCATTG CC 12 



( 2 ) (NFORMAnON FOR SEQ D KO-J49: 

( 1 ) SEQUENCE CKARACTERISnCS: 
( A)LENC7TH:Uba«cpaIn 
( B)TY?E:sadcfe add 
( C ) SnUNDEDNESS: tb^Ie 
(D)TOPOLOGICIhear 

( I I ) MOLECULE TYPE: DNAOirobe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-J49: 

OTTTTCATCT AGO I 5 



( 2 )INFOR.NCAnOKFORSEQIDNOasa 
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( I ) SEQUENCE CKARACratSnCS: 
(A)l£SCTH:l2ht3cptb% 
( B )TY7E:o«leIetc!d 
( C ) SntViDEDNESS: siDjtc' 
... ( 0 )TOPOLOCY: Ihcir .. - ' 

M M )MOl£CULETYPE:D.VA(^ob€)■ 
( x I ) SEQUENCE DESCRIPTTON: SEQ ID NO:250: 
OGOTTTTGAT CT 



( 2 ) INTOR\CAnOX FOR SEQ ID NO:25L- 

( I ) SEQUENCE CHARACIERISnCS: 
( A ) lENCTH: U bu« pa!n 
< B )TYPE: Bodeictdd 
( C ) STRAM5ED.VESS: sb jle 
( D)TOPOLOCY: linear 

( I I ) MOLEOnJE TYPE: DXA (probe) 

( K I ) SEQUENCE DESCRimON: SEQ ID NOi25l: 

CCACCCCCTT T 



( 2 ) INFORMAnON FOR SEQ ID NOOSi 

( I ) SEQUENCE CHARACTERISnCS: 
( A)LENCTM:13 baiepain 
( B )TYP&BocleJcicid 
( C ) STRANDEDNESS: single 
( D)T0POLOCY:Imcir 

. ( I I )MdLECUlETirPE:D.VA(probe) 

( X I ) SEQUENCE DESCWPTIOX: SEQ ID NO'J52: 

GTCAATACTT GCC 



( 2 ) INFORMAnOK FOR SEQ CD NO-^ J: 

( I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: U buc pain 
( B )TY7E:Bodeic >cid 
( C ) SnUNDEDNESS: iragte 
( D)TOP0LOCY:laar 

( I I ) MOIZCULE TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRIPTION: SEQ ID KOa53: 

CGCTCAGTCA ATA 



( 2 ) P|FOR\CAnON FOR SEQ ID KO-^4: 

■ . . ( I ) SEQUENCE CKARACTERISnCS: 
( A)L£N(m{: Ubuepain 
( B)TYPE:aod<w»eI(l 
, ( C ) STRANDEDNESS: ibgle 
(0)TOPOLOCY:roiar 

( I I )MOLECUl£ TYPE: DNA (probe) 

( X I ) SEQUENT OESCRimON: SEQ 0) NOOJl: 

TCGGTCAGTC AA 



( 2 ) INFORMAnOW FOR SEQ ID N0-JS5: 

( I ) SEQUENCE OiARACrERffnCS: 
( A ) LENGTH: 12 buc p*h% 
( B)TYP£:>od<{e*e{d 
( C ) HRANDEDNESS: tb^e 
( D )T0POLOGT^ Ibcu 
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( i I ) MOLECULE TYPE DXACfTobe) 
(* I ) SEQUENCE DESCRmrON: SEQ ID NO:255: 
TGTTCATGCC Yc'y^[ / '•*. : r- ' ' . V - 

C 2 ) I>fFORMAnO*V FOR SEQ CD HOOS^i 

( I ) SEQUENCE CKARACTERiynCS: 
( A ) LENGTH: 12 bas« patn 
(.B)TYPE: BodcIe»cid 
( C ) STRANDEDNESS: shgte 
( D)T0POLOGY: linear 

( I i ) MOLECULE TYPE: D.VA (probe) 

■ ( X I ) SEQUENCE DESCRIPnON: SEQ ID NG-^; 

COOTTGTTGA TO 



( 2 ) INTORMAnON FOR SEQ ED NO-J57; 

( I ) SEQUENCE CHARAOZaiSnCS: 
( A } L£NGTK: U buc pain 
( B )TirP£:8odcte*ci(I 
{ C ) STOANDEDNESS: «in^e 
{D)T0roLOGY:lmar 

( I I )MOl£CULE TYPE: D.VA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-^: 

ACATACCGGT TG . 



( 2 ) INFORMAnON FOR SEQ [D K(>J5& . 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: U base pain 
( B ) TYPE: aodcle acid 
( C ) STKANDEDNESS: ibgte 
(D)TOPOLOGY:liaar 

( I I )MOl£CULE TYPE: DNA (probe) 

( » I ) SEQUENCE DESCRIPTION: SEQ ID NO:25a: 

CAAAATACAT ACC 



( 2 ) INFORMAnON FOR SEQ ID NO-^9: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH; U bas« pain 
(B)TYP£:aadck»cId 
-(C) STRANDEDNESS: s'm^e 
(D)TOPOLOCieiaiear 

(II )MOl£CUl£TYP&OXA(frob«) 

( » I ) SEQUENCE DESCRIPTION: SEQ ID NO-^9: 

AATGTACOAA AAT 



( 2 ) INFORMAnON FOR SEQ Q) KOJfiOt 

( I ) SEQUENCE CKARACTEROTTCS: 
( A } LENGTH: U bua pain 
( B )Typ£:aod<ktcid 
( C ) STRANDEDNESS: tm^n 
( D)TOPOLOG\^liiiatr 

( ! t ) MOLECULE TYPE: OXA (probe) 



( I I ) SEQUENCE DESCRimON; SEQ ID NO-J60: 
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GCAGTAATCT ACC 

( 2 ) INFORMAIION FOR SEQ ID *VO:Ml: 

. -Vr) SEQUENT , . = 

■ ( A > t£XCTK: 12 base ptin 

( B}TYF£:aodete»cid 
( C ) SnUNDEDNESS: ib^e 
(0)TOPOLOCY:IIna/ 

( I I )MOl£CULE TYPE: D.VA (probe) 

( X I ) SEQUE>fCE DESCRIP7T0N: SEQ ID N(Wai: 

TGOCTOCCAC TA 



( 2 ) INFORMAnO.V FOR SEO £D NO-^i 

( I ) SEQUENCE CKARACTERtSTTCS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: oocleic tcid 
( C ) STRANDEDNESS: slo^e 

( D )T0PaLOGY^ Uoor ^ 
( I I ) MOLECULE TYPE: D.VA (probe) 
( X i ) SEQUEta. DESCRIPTION: SEQ ID N(h2d2: 



( 2 ) [NFORMXnON FOR SEQ ID NO:25J: 

( i ) SEQUENCE CKARACTERtynCS: 
. ( A ) LENGTH: U \nsc pjics 
( B)TYP£:aodcic>cul 
( C ) STRANDEDNESS: »fa^e 
(D)TOPOLOGY:riaear 

( I I )MOt£CULETYPE:DNA(prob«) 

( X I ) SEQUENCE DESCRXKnON: SEQ ED N0-^6J: 

ACAATATTCA TGC 



( 2 ) INFORNUnON FOR SEQ ED N0:264: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U bue pain 
( B)TYPE:aqdcIc»cld 
( C ) SnUNDEDNESS: i Ingte 
( D )T0POLOGY^ Imear 

( I I ) MOl£CULE TYPE: D.VA (probe) 

. ( X I )SEQUENCE DESCRIPTION: SEQpNO-^«: 

TAGAATCTTA OCT 13 



( 2 ) INF0R\CAn0N FOR SEQ ID N0:2SJ: 

( I ) SEQUENCE CKAAAOERISTICS: 
( A )L£NC7TH: U bu« pain 
( B )TYPE:Dsdc{e>cM 
( C ) STRANDEDNESS: sfaflle 
(D)TOFOLOGYllIsor 

( I I )MOLECUl£ TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRIPTION: SEQ ID N0-Jfi5: 

TTTAAATTAC AAT 13 



( 2)INFORMAnONFORSEQ[DNO:2Stf: 
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( I ) SEQUENCE CKARACTERtSnCS: 
( A)LEXGTK: Ubu« patn 
/ ( B)TYF£:socle)C*cid 

( C )STRANDEDNESS: ihgle 
.(p)T0POlOGy:Uiiai 

. . . ( I I )MOlEaji£TYP&DyA(frobc) 

( X I ) SEQUENa OESCRIFITO:^: 5EQ ID NCh356: 

CAATAAGTTT AAA 

( 3 ) INTORMAnO.V FOR SEO ID NO:2d7: 

( 1 )SEOUEXGECKARACTER£SnCS: 
( A)L£>*(7TK:t3 bue pain 
( B )TYFE:sttde!c »cfd 
( C ) SniANDEONESS: ihgle 
(D)TOroLOGY: linear 

( t I ) MOLECULE TYPE: DXA (probe) 

( « 1 ) SEQUENCE DESCRIPTIO*V: SEQ ID HOHSJ: 

CAACACACAA TAA 



I 3 



( 2 ) DtTORMAnOiV FOR SEQ ID K0-^& 

( t ) SEQUENCE CKARACTERCSnCSi 
( A ) LENGTK U bue patn 
( 8 ) TYPE: OBcIeic tctd 
( C ) SntANDEDNESS: sm^te 
(D)T0POLOGY:ltBcar 

( i i lMOLEciriE TYPE: DNA (probe) 

( 1 I ) SEQUENCE DESCRIPTION: SEQ H) SOaiB: 

AAACAACACA CAA 



1 3 



( 2 ) INPORMAnON FOR SEQ ID NO-J69: 

( I ) SEQUENCE CXARACTERtSTTCS: 
( A)t£N(7TK: Ubajcpaln 
( B)TYPE:oade{e«c!d 
( C ) SnUNDEDNESS: single 
. ( D)T0P0LOCY:Uaear 

( i I )MOl£CUl£TYPE:DXA([»obe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:269: 

CCCATCAAAC AA 12 



( 2 ) DiTORMAnON FOR SEQ ED ffO-^Cb 

( 1 )SEQUENOECKASACTERiSnCS: 

( A)L£N(7IK:12bajcp«In . 
( B) TYPE: toelele add 
( C ) STRANDEDNESS: th^t 
. . ( D)T0FOLOGY: Uoear 

( 1 1 ) MOLECULE TYPEi'DXA^obc) 

( « I ) SEQUENCE DESOttPTION: SEQ ID NOOTO: 

TTCCCCATGA AA 12 



( 2 )INFORMAnONP0RSEQDW«7l: 

( t ) SEQUENCE CKARACTERmiCS: 
( A)L£NCTK:12baJcpa!n 
( B) TYPE: radck add 
( C ) STRXSDEDNESS: thgte 
( D)T0raLOG1£Qaear* 
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( I I ) MOLECULE TYPE: D.VA (probe) 
(x I ) SEQUENCE DESCRTPTTON: SEO a> .. 
. 'AtCTCCTTCC ,CC ■ ■ ' 



( 7 ) JSTORMJJIOS FOR 5EQ [D NO:272: 

< I ) SEQUENCE CHARACTCRtSnCS: 
( A)tEKCTH: U base pain 
< B )TYP£: Doclclodd 
( C ) SnUNDEONESS: sb gle 
CD)TOPOLOCy:Imeaf 

( I 1 )MOl£CULEmE:DNA(ptobe) 

( « I ) SEQUENCE DESCRIPTION: SEQ tO NO:27i 

CAAATCTCCT TC 



( 2 > DiTORMAItOK FOR SEQ CD NO:273: 

( I ) SEQUENCE CKARACTHRISnCS: 
( A ) UENCTK- 12 base piln 
( B)TY?E:nodeIcKU 
( C }Sn(ANDEDN£SS: $mg^t 
(0}TOFOLOCY: linear 

( I I ) MOL£CUL£ TYPE: DNA (probe) 

( Jt i ) SEQUENCE DESCRIPTION: SEQ ID NO:273: 

CGTACCCAAA TC - 

( 2 ) INFORMAnON FOR SEQ ID NO:274: 

( I ) SEQUENCE CKARACreRtSnCS: 
( A ) LENC7TK: 12 base pain 
( B ) TYPE* DDdete acid 
( C ) SnUNDEDNESS: sid^c 
( D) TOPOLOGY: lisear 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ 0) NO:27*: 

OCTCOTACCC AA 



< 2 ) DiTORMAnON FOR SEQ ID NO:275: 

( I ) SEQUENCE CHARACIERISnCS: 
( A)L£NGTH: Ubascpain 
( B )TYPE:aoclekKU 
( C ) STRANDEDNESS: single ... 
(D)TproLOGY:liacar 

( i I )MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NOOTS: 

TACTTCCOTO CT 12 



( 2 ) efFORAWnON FOR SEQ ID N0-^& 

( I ) SEQUENCE CKARACTERISnCS: 
( A)L£>'C7TH:12 b«4«paln 
( B)TYP£:aBdeIeKid 
( C ) SntANDEDNESS: abjle 
( D)TOroLOCY:Unear 

( I t ) MOLECULE TYPE: ONA (probe) 



( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-^6: 
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TCCAAAAACC TT 

( 2 ) (NTORMAnCV FOR SEO ID KOJH: 

. ( ' I ) SEQUENCE CHARAtreRISnCS: . • 
- (A)LEKCIK:12>Kfnt^ - 
{ B)TYFE:BQcIcIe»cul 
( C ) STMSDEDSESS: sbgle 
(D)T0POLOGy:liae4r 

( I I ) MOLECULE TYPE- DXA(prob«) 

< X I ) SEQUENCE DESCRIFnON:* SEQ fD NO:277: 

CTCCTTCCAA AA 

( 2 ) INTORMAnOX FOR SEQ ID NaZ78: 

( I ) SEQUENCE CKAIUCTHR£SnCS: 
< A ) LEKOIK* 12 bue pin 
( B)TY7£:BocIetc>ctd 
( C ) SnUNDEDNESS: sm^e 
( D) TOPOLOGY: Uneu 

( I I ) MOLECULE TYPE: DXA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-^TB; 

ATTTCTCCTT CO 



I 2 



( 2 ) INFORMATION FOR SEQ ID NOJ79: 

( i ) SEOUENa CHARACTERBTICS: 
( A ) LENGTH: 13 bisc pain 
.( B ) TYP£: aodcic »eid 
( C ) SIKANDEDNESS: iiBgle 
( D)T0POLOCY:UBar 

( I I ) MOLECULE TYPE: DXA (probe) 

( X i ) SEQUENCE DESCRTPHON: SEQ U) NO:279: 

CTCTCATTTG ICC 



1 3 



( 2 ) INFORMATION FOR SEQ ID N0:2Sa 

( I ) SEQUENCE CHARACTERCTTCS: 
( A ) LENGTH: 13 bue pain 
( B )TYPE:aodeIe*eId 
( C ) STRANDEDNESS: sh^Ie 
( D ) TOPOLOGY: Kiiev 

( 1 I )MOLJECUl£TYPE:DXA(frobc) 

( X I ) SEQUENCE DESCRIPTION: SEQ U> NO:2S0: 

TTTTTCtCTC ATT 



1 3 



( 2 ) INFORMAnON FOR SEQ ID N0-J81: 

( I ) SEQUENCE CHARAOEamiCS: 
( A)LEKCTH:Ubu<paIn 
< B)TYP£:Bod«{e*eId 
( C ) STHANDEDNESS: ib jle 
( D)TOP0LOCY^lhear 

( 1 I ) MOL£CUL£ TYPE: DNA (probe) 

( X 1 ) SEQUENCE DESCRIPTION: SEQ CD NO-^I: 

TAAACACTTT TTC 



1 3 



( 3 ) INFORMAnON FOR SEQ ID tfO-JSt 
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( I ) SEQUENCE CHARACTERISnCS: 
( A ) LEXCTK: U buc pain 
( B )Ty?E:aocIcic tcid 
( C}STRAM}£0N£5S:«ta^c / 
( D )TOroLOGY: Imiar . / 

( i I ) MOLECUl£TYRE:D.VACFobc) 

( X I ) SEQUENCT DESCRIFTtON: SEQ © N'0:2a2: 

GTCOACTTAA ACA 



( 2 ) INFORMAnOH FOR SEQ D) NO-JSJ: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U bajc piin 
( B)TY?E:aocleie »cld 
( C ) SnUNDEONESS: sb^e 
(O)TDFaLOGY: linear 

( I I ) MOl£a/U TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRIPTION: SEQ ID NO-^: 

TCCTCCAGTT AAA 



( 2 ) INFOR.\CAnON FOR SEQ ID NO-JW: 

( I ) SEQUENCE CHARAC^THUSnCS: 
( A ) LENGTH: 13 baie pain 
( B ) TYPE: sadeic scid 
( C ) STRANDEONESS: sla^e 
(D)TOPOLOCY: linear 

(II )M6La:Ul£T)fPE:DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ DO NO:284: 

TGCTAATGGT CO 



( 2 ) INFORMAnON FOR SEQ ID HO-JSS: 

( I ) SEQUENCE CKARACIERCTTCS: 
( A)LENGTK: l2baMpaIr> 
( B) TYPE: sadeic •etd 
( C ) SIRANDEONESS: single 
(0)TDPOLOGY:tiDear 

( I i )M01B:ULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-J85: 

TTGCOTCCTA AT 



. ( 2 ) INFORMAnON FOR SEQ ID K0:2W: 

( I )SEQUENCECHARACTERISnCS: 
( A ) LENGTH: L2 bue ptin 
( B) TYPE- Bvlcie add 
( C } STRANDEONESS: 
( D)TOroLOGKlhear 

( t I ) MOLECULE TYPE: DNA (probe) 

( I I ) SEQUENCE DESCRIPTION: SEQ ID NO-^: 

TACCTTTCGC TO 



( 2 ) INFORMAnON FOR SEQ ID NOW: 

( I ) SEQUENCE CHARACIEROTCS: 
( A ) LENGTH: 13 bue patn 
(B)TYPE: add 
( C ) STRANDEDNESS: »h^e 
(0}TOFOLOGY^BBear 
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( I I )MOLECUl£TYFE:byACprab*) 
. ( X I ) SEQUENT DESCHOTIpX: SEO ID N'OOST: 
.•.•■■•'* TCTT AG CTTT CC . * :* ••.. •'. - ' ■ 

( 2 ) tsTORsmnoa Foa scq id no-^- 

( I )SEQUENCTCHAaACTEarSnCS: 
( A)UNCIK:rbi«p3!fi 
( B )TYFE:aod^ Kid 
( C ) SniANDEDNESS: ib^Ie 
(D)TDroLOCY:lhax 

( I I )MOLECUl£mE:DyACFrofcc) 

( 1 I ) SEQUENCE DESCRimON: SEO 0) NO-J88: 

CACTTGTGCC CTCACTTTCA AC 



( 2 ) INFORMAnON FOR SEQ ID 

( I ) SEQUENCE CHARACTEROTCS: 
( A ) LENGTH: 49 bucpain 
( B> TYPE* DKtcIe Kid 
( C ) STKAVDEDN-ESS: $a^t 
( D)T0fOLOGY:ti&eir 

( I I )MOLEa;i£TYP£:D.VACf»obc) 

( X I ) SEQl/EXCE DESCRIFnON: SEQ ID NO:2S9: 

ATGCAATTAA CCCTCACTAA ACCCAGACAC TTGTGCCCTG ACTTTCAAC 

( 2 ) tNFORMAnON FOR SEO ID NOJMt 

( I ) SEQUENCE CHARACIERISTTCS: 
( A)LEX<7IK:25b»iep»!n 
( 8 ) TYPE: oxfclc Kid 
( C ) SnUKDEDNESS: «h^e 
( D)TOroL0CY:Uticaf 

( I I )MOl£CUl£TTfrcrDXA(profce) 

( « I ) SEQUENCE DESatimaV: SEQ ID Na2S0: 

CACCCTGGGC AACCACCCCT GTCGT 



( 2 ) INFORMAnON FOR SEQ ID ftOiSti 

( I ) SEQUENCE CKARACIERISnCS: 
( A)t£NCTH:47b«MfnIn 
( B) TYPE: sadcie Kid 
( C ) SIRANDEDNESS: ibgle 
<D)tDIOLOCYSfcof 

C I I )MOLECUUTYP£DXA0»obc) 

( a I ) SEQUENCE DESCRIPTION: SEQ ID N0-J91: 

rAATACOACT CACTATACCG AGCACCCTGG GCAACCACCC CTGTCGT 



{ 2 ) INFORMATION FOR SEQ ID NO:29i 

( I ) SEQUENCE CHARACIERISnCSi 
( A)L£NCn{:25bti«pa!n 
( B)TYPE:SKScieKid 
( C ) CTRANDEDNESS: «a^e 
(0)TOfOLOGl£ [bar 

( I I )MOLECUl£TTfPE:IWA(prob«) 



(Ml) SEQUENCE OESaUmaV: SEQ ID NO-^ 
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CTAGAATTCT GTTCACTCAG ATTGC 



( 2 ) INFORMXnO.V FOR SEQ (D Na29J: . ; - 

■ ' ( i ) SEQUENCE CHAHACIERISTTCS: . ' 

( A ) UENCTH: 27 baK piin 
( B )TYF£:Bade!e*cid 
( C ) STRANDEDNESS: ih^fc 
( D)TOP0LOGY:Iiiiew 

( I I )MOt£CULETYPE:D.VACprol,<) 

( X I ) SEQUENCE DESCWFnON; SEQ tD N0:2M: 

AAATCCATAC AATACTCCAC TATTTCC 



( 2 ) INFORMAnON FOR SEQ ID NO:29*: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENCIK: 27 buc pair* 
( B )TYPE:oDdcIe*cId 
( C ) STRANDEDNESS: slagle 
(0)T0P01.0CY; Vnai 

( I I )MOLECUl£TrP&D.VA(pcobe) 

( X I ) SEQUENCE DESCRIPTTON: SEQ W NO-J94: 

GATAAGCTTG GCCCTTATCT ATTCCAT 



{ 2 ) [NFORMATTO.V FOR SEQ ID N*0:295: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 28 b«H! pain 
( ,B ) TYPE: oodeic *e!d 
( C ) SnUNDEDNESS: lin^e 
( O ) TOPOLOGY: tlaear 

( I I ) MOLECULE TYPE: DNA(f»obe) 

( X I ) SEQUENCE DESCRIFnON: SEQ ID NOa95: 

ACCCATCCAA ACGAATGGAG CTTCTTTC 



( 2 ) INFORAWnON FOR SEQ ID Na29& 

( I ) SEQUENCE CKARACIERISnCS: 
( A ) LENGTH: 12 bue pain 
( B ) TYPE: flodcic >cld 
( C ) STKANDEDNESS: single 
(D)TOPOLOGT(i Uaear 

( I I ) MOLECULE TYPE: DXA(oUf(»oclMtkfe} 

( X I ) SEQUENCE DESCRIPTION: SEQ ID f(0aS6i 

. AGCCTACCTC AA . 



( 2 ) INFORMAnON FOR SEQ D NO-^7: 

( I ) SEQUENCE CHARACTERUnCS: 
( A ) LENGTH: 12 baM pain 
( B } TYPE: Bodcle Kid 
( C ) STUANDEDNESS: tbxg^e 
(D)TOP0LOGY:lhear 

( I I )MOlECUl£TYPE:DXA(oH5cood«)tI<Jc) 

( X I ) SEQUENCE DESCBIPTiaV: SEQ tD NO-^7: 

TCCOATCGAC TT 



( 3 ) INFORMAnON FOR SEQ ID KO-J98: 
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( I )SEOU£NC£aOUtACr£a(5nCS: 
( A )LENCTK'23 b«j< pain 
( B )TYP£: Dodele Add 
( C )S1RANDEDNESS: .b^Ie 
( D )TOPOLOGifi Imoj 

. ■ < i r)MOLECUl£Tyre:DNA(pr«ibe). - / ■ . 

( X I ) SEQUENCE DESCRimaS: SEQ CD 
CCCAATTAAC CCTCACTAAA OG 2 2 



( 2 ) fSTORMXnOS FOR SEQ O 

( I ) SEQUENCE CHARACIERISnCS: . 
( A)l£NGTK22baje pain 
( B )TYr£: Bodeie Ktd 
( C ) SnUNDEONESS: s'mgle 
( D)TDR)LOG% linear 

( 1 I )MOLEaJl£ TYPE: aVA (probe) 

( * i ) SEQUENCE DESCRIPnON: SEQ ID NO-^: 

AATTAACCCT CACTAAACCC AC • . ' 



( 2 ) INFOR.V[AnON FOR SEQ ID NOJOCb 

( I ) SEQUENCE CKARACTERtSnCS: 
( A } LENGTH: 22 btic pain 
' ( B ) TY?£: ood^e aetd 
.(C) SntANDEONESS: single 
( D )TOPOU)C\: linear 

(II )MpL£CUUTrPE:b.VA(pfobe) 

( X !) SEQUENCE DESCRIPTtON: SEQ © NO-JOO: 

TAATACCACT CACTATAGCC AG 2 2 



( 2 ) INFORMXnON FOR SEQ ID KO-JOL- 

( I ) SEQUENCE OURACTEIUSnCS: 
( A ) LENCtK; 20 bajc pain 
( B ) TYP& Bodete add 
( C ) SnUNDEDNESS: sra^te 
< D) TOPOLOGY: lioor 

( I I ) MOLECULE nrE:D.VA (probe) 

( X I ) SEQUENCE OESCRirnON: SEQ ED NO-JOl: 

ATTTAGGTCA CACTATACAA 20 



( 2 ) DfFOR\(AnON FOR SEQ ID NOJ02 

( I )SEQUENCEaURACTCaiSnCsl . 

( A )lJEMnH:lObtj«pain . 
( B)TY7Endcle«cU 
( C ) STVANDEDNESS: »fa jte 
( D)'roPOLOG\:CBcu 

( I I )MOLECUUETYPE:DyA([«*«) 

( X I )S£QU£NCE DESCSIFmS: SEQ ID KO-J02: 

GATKATATTT 10 



( 2 ) INFORACAnON FOR SEQ D NOtJOi 

( i ) SEQUENCE CKARACTERISnCS: 
( A )LEKCIU:UbaaepaIrs 
( B)TYrendcie>a(i 
( C ) SntANCEDNESS: th^e 
( D ) TOPiaOCY^ toear 
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( i I )MOLECUl£TYPE:D.VA(prob<) 
( X i ) SEQUENCE DESCRIPTION: SEO CD NOJOi: 
- AC AKGAT ATT . . 

( 2 ) [NTORMAnON FOR SEQ CD N'Ch304: 

( I ) SEQUENCT CKARACIERCSnCS: 
( A ) LENGTH: tO base pain 
( B )TYFE:DocIcIe*dd 
( C ) STTtANDEONESS: sb^e 
(D)TOPOLOCY: linear 

( I I ) MOtEOaE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRimON: SEQ ID NO-JW: 

AAONTCATAT 

( 2 ) [NFORMAnON FOR SEQ D NOJ05: 

( I ) SEQUENCE CHARACTERISTTCS: 
( A ) lENCTK: tO base pain 
( B)TYP£:Qad(!caeId 
( C ) STRANDEDNESS: sin^e 
( D)T0POLOGY:Uaear 

( I i ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ [ONO-J05: 

AAANATCATA 



1 0 



( 2 ) INFORMAnON FOR SEQ ID KO-JOtf: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: lObaie pain 
( B }TYPE:Doet<ic*etd 
( C ) STRANDEONHSS: sin^e 
( D ) TOPOLOGY: Uneaj 

( I i )MOl£CUIJE TYPE: DNA (probe) 

( K I ) SEQUENCE DESCRrPnON:SEQ ID KOJ06: 

CAANOATCAT 



1 0 



( 2 ) INFORMAnON FOR SEQ ID NOJOT: 

( I ) SEQUENCE CHARACIEWSnCS: 
( A ) LENGTH: lObuepaln 
( B ) TYPE: Dodcle tctd 
< C ) STRANDEDNESS: shgle 
<D)TOP0LOGY:li8ear 

(II )MOLECUl£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ CD NO:}07: 



( 2 ) WFORMAnON PGR SEQ CD NO-J0& 

( I ) SEQUENCE CKARACTHRISnCS: 
( A ) LENGTH: 10 base pain 
( B )TYF£:BodeteKid 
( C ) STRANDEDNESS: iln jle 
(D)TOPOLOGY:UBear 

( I I ) MOLECULE TYPE: DNA (irobe) 



( K I ) SEQUENCE DESCRimON: SEQ ID NO-J08: 
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ACCNAACATG 

( 3 )C^pRMAnaVF0EtS£Qn>N<>J09: 

. ( I ) SEQUENCE CKAfUCTERISttCS:. 

( A ) lENGTH: 10 biie pain ' 
( B )TY7£:oocletc icid 
( C ) STRAM)£ONESS: sbgle 
( D ) TOPOLOGY: Uaear 

( I I )UOl£CUl£ TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEO ID KO*J09: 

CACKAAACAT 



( 2 ) INFORMAnONFOR SEQ ID NOJlOt 

( I ) SEQUENCE CKARACIERISnCS: 
( A ) LENGTH: 10 bue pain 
( B)TYFE: oodeie >eid 
( C ) SnUNDEDNESS: smAle 
( D)TOPOLOGY:Ifaar 

( I I ) MOLECULE TYPE: DNACpro(«) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-JIO: 

AGAAACNACA 



1 0 



( 2 ) CNFORMAnON FOR SEQ ID NO-Jlt: 

(I ) SEQUENCE CHARACTERISTICS: 

.( A ) LENGTH: 16 basie pain . .. 
( B)TY7E: oi)cIeietctd 
( C ) STRANDEONESS: siQ3le 
(D)TOP0LOCY:Iiiiear 

( I I ) MOLECULE TYPE: DNA (probe) 

( Ji I ) SEQUENCE DESCRIPTION: SEQ ID NO JU: 

ATTTCATTCT GTATTG 



1 6 



( 2 ) INFOR.VCAnON FOR SEQ CD NO-J12: 

( I ) SEQUENCE CHARACIERlSnCS: 
( A ) LENGTH: 16 base pain 
( B )TYP£:oaetcIcftcId 
( C ) STRANDEDNESS: shgle 
( D ) TOPOLOGY: Uaew 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( 1 I ) SEQUENCE DESCRIPTTON: SEQ ID N0-J12: 

CCGACTGCAC TCOTTA 



1 «' 



( 2 ) INFORMAnON FOR SEQ DD N0-J13: 

( I ) SEQUENCE CKARACIERXSnCS: 
(A)L£NGTH:l5bu<paIra 
( B)TYPE:niicleIc acid 
( C ) STRANDEDNESS: sh^e 
(D)TOPOLOCY:Uiiear 

( I I ) MOLECULE TYPE: DNA (probe) 

( X 1 ) SEQUENCE DESCRIFTtON: SEQ ED NO-JU: 

CCGACTGCAC TCGTT 



1 5 



( 2 ) INFORMAnON FOR SEQ ID NOJli: 
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( I ) SEQUENCE OURACTERtSTTCS: 
< A ) LENGTH: 15 bue pain 
( B ) TY?E: OQcIcIc tetd 
. ( C ) SnUNDEDNESS: *ta^e 
(0)TOFOLOCY:tiiieax 

■( I I ) MOt£da£TY7E:DNA0rofee) - r ' \ 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N0-J14: 

CCOACTACAG TCOTT 



( 2 ) INFORMAnON FOR SEQ ID N0-J15: 

{ I ) SEQUENCE CHARACTERISnCS: 
( A ) LENCTK: 15 bue patn 
( B )TirFE:oodcIc *cld 
( C ) SnWNDEDNESS: shifle 
( D) TOPOLOGY: Ihear 

( I I ) MOLECULE TYPE: D.VA(pcobe) 

( X I ) SEQUENCE DESCRIFTION: SEQ ID KO-JLi: 

CCCACTCCAC TCCTT 



( 2 ) INTORMATION FOR SEQ ID N0J16: 

< I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: 15 base pain 
( B)TYP£:aQClek acid 
( C ) STRANDEDNESS: atngle 
( D)T0P0L0CY:lincif 

( I I )MOLECUl£TYreiDNA0itobe) 

( x I ) SEQUENCE dESCRTPTION: SEQ ID NO-JItf: 

CCCACTTCAG TCCTT 



( 2 ) INFORMAnON FOR SEQ ID N0-J17: 

( I ) SEQUENCE CKARACTERISTTCS: 
( A } tENCTK; 35 baac pain 
( B)TYPE:aDcI«Ic»cCd 
( C ) STRANDEDNESS: sra^e 
( D)T0F0L0CY:llDar 

( J i ) MOLECULE TYPE: ONA(prob«) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID N0-J17: 

CTAATTTCTT TTATAOTAGA AACCACAAAG GATAC 



. ( 2 ) INFORMAnON FOR SEQ D N0J18: 

( i ) SEQUENCE CHARACTERISnCS: 
' ( A ) LENGTH: 35 bixspain 
( B)TYPE:aadcieactd 
( C ) STRANDEDNESS: thi^c 
(D)TOPOLOGY^Uiieu 

( 1 1 }MOLECUUTYP£:DNA(oIlgoatxlMUdc) 

(si) SEQUENCE OESCRIPTTON: SEQ CD NO-JlS: 

CATTAAACAA AATATCATCT TTGGTCTTTC CTATC 



( 2 ) DfFOR\CAnON FOR SEQ ID N0J19: 

( i } SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: 32 baaepain 
( B) TYPE: ovtde add 
( C ) SnUNDEDNESS: »h^« 
(D)T0P0L0CY.IW 
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( I I ) MOLECULE TYFE:DXA(oU5«iBdeotid*) 
. . ( « I ) SEQUENCE DESCRTPTIOPf: SEQ CD K0-J19: . • 
CATTAAAGAA .AATATCATTCr CTGTTTCCTA TG , 

( 2 ) [NTORMAHON FOR SEQ ID K0-J3Q: 

{ I ) SEQUENCE CKABACIERISnCS: 
(. A ) LENGTH: 18 buc pain 
( B ) TYPE: aodek add 
( C ) STKANDEDNESS: sbjte 
( D ) TOPOLOGY^ Unco/ 

( i I ) MOLECULE TYTE DNA (probe) 

( « I ) SEQUENCE DESCIUFnON: SEQ DO NO-JIO: 

CATTAAACAA AATATCAT 



.3 2 



( 2 ) INFORMAITOH FOR SEQ tD N0-J3L- 

( I ) SEQUENCE CHARACIEaiSnCS: 
< A ) LENGTH: 35 pain 
( B )m£:ftodcIc»ci(t 
( C ) STRANDEDNESS: «h^e 
( D)T0POLOGY:Uaar 

( I I ) MOLECULE TYPE: D.VA(oligoQ«IcolWe) 

( « I ) SEQUENCE DESCRIPTION; SEQ CD N0O21: 

TATTAAAGAA AATATCATCT TTGGTOTTTC CTATC 

( 2 ) [NFORMAITON FOR SEQ © NO-J22: 

( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH: M baie pain 
( B )TY7E:aiidctc*el(l 
( C ) STTtANDEDNESS: sh^c 
( D)TOPOLOGY:ti&ar 

( I I ) MOLECULE TYPE: DXA(cligottoc!«tJdc) 

( X I ) SEQUENCE DESCRIPTION: SEQ U> N0-J21* 

CCTTAAAGAA AATATCATCT TTGGTGTTTC CTAAA 



3 5 



( 2 ) INFORMAnON FOR SEQ ID KOa23: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: 35 baac pain 
( B) TYPE: aodck add 
( C ) STKANDEDNESS: siajle 
( D )TOPOLOGY^&ear 

( I I )MOLECULEtYPE:DXA(oUgoaodctidc} . 

( X 1 ) SEQUENCE DESCRIPTION: SEQ ED NO-J23: 

CTTTAAACAA AATAAAAAAA TTGGTGTTTC CTAAA 



3 3 



( 2 ) [NFORMAHON FOR SEQ ID NOJ24: 

( I ) SEQUENCE CKARACIERSnCS: 
( A)L£NGTH:20baa<patn 
( B)TYP£:oodcktcid 
(C)SnUNDEDNESS: ahjle 
( D)TOF0L6GY:[aetf 

( I I ) MOLECULE TYPE: DNA(prob«) 

( X I } SEQUENCE DESCRIPTION: SEQ ID NO-J24: 
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OGAAGTCTCC CATTTTAATT 

( 2 ) I>TOEL\CAnoy FOR 5EQ (p Np-J25: 

• . ■; ( i )SEQUENCTCKAIudrEa£STTCS: . 

( A)t£XGTH:Mb«ep«ir» • 

( B)TYP£: ooclek>cid 

( C ) SrTKANDE04N£SS: ibglc 

(D)TOPOLOGY:tmev 

( M ) MOLECULE TYPE: D.VA (frobe) 

( X I ) SEQUEXCE DESCRTFTION: SEQ Q) NO-J25: 

CCTTCAOAGC CTAAAATTAA 



( 2 ) DiFOHMAnON FOR SEQ ID NO:326: 

( I ) SEQUENCE CHARACTHRtSnCS: 
( A ) IfNOTK: 20 bue pain 
( B )TYPE: audeic Ktd 
( C ) SnUNDEDNESS: sla^e 
( D)T0P0LOGY; linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( I I ) SEQUENCE DESCRimON: SEQ n>N0-J26: 

CCTTCACAOK CTAAAATTAA 



2 0 



( 2 ) [NPORMXnaV FOR SEQ ID NOa27: 

( i ) SEQUENCE CHARACllERISnCS: 

( A ) LENGTH: 20 bw p»ir» • . 
( B ) TYPE: aoclcic Kid 
( C ) STRANDEONESS: sbgle 
( D)TOPOLOGY:ltBeir 

( I I ) MOLECULE TYPE: DNA (probe) 

C I i ) SEQUENCE DESCRIPTION: SEQ ID NO-J27: 

CCTTCAGAGT GTAAAATTAA 



2 0 



( 2 ) INFORMAnON FOR SEQ ID N0J2a: 

( i ) SEQUENCE CHARACreRtSnCS: 
( A)L£NC7TH:l9bisepun 
( B)TYPE: Bodelodd 
( C } SntANDEDNESS: ifagle 
( D } TOPOLOGY: Uaear 

( I 1 ) MOLECULE TYPE: DNA(prob<) 

(El) SEQUENCE DESCRIFTTON: SEQ ID N0i)2S: 

CCTTCAGACG CTAAAATCA 



.1 9 



( 2 ) DiTORMAnON FOR SEQ ID HOiSN: 

( I ) SEQUENCE OOUUCIERISnCS: 
( A ) LEN(nK: 19 bsMpaln 
( B)TYF£:BiKldc»cU 
( C ) SnUNDEONESS: sbgte 
(D)TOPOLOCY^Itaear 

( I I )MOLEa;UTYPE:DNA(frobc) 

(si) SEQUENCE DESCRHOTON: SEQ ID NO-J29: 

CCTTCAGACG CTAAAATTA 



1 9 



( 2 ) DiTORMAnOK FOR SEQ D NO-JJft 
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( I ) SEQUENCE CKARACTERtSnCS: 
(A)tENGTK: 19 bu« pairs 
( B)TY7£:siietttc*e!d 
( C ) SnUNDEDNESS; ifajlc 
. - (D)TOPOLpGY:Ikcar r 

( i I ) MOLECUl£ TYI£DX\(pobe) 

( a I ) SEQUENCE DESCSmiON: SEQ ID NO-J30: 

CATtCAOAGT GTAAAATAC 



< 2 ) INFORMAnON FOR SEQ ID K0-J3 U 

( I ) SEQUENCE CKARACTEItlSTTCS: 
( A)L£NCTK:t9bucpain 
( B)TY?E:Bodck*dd 
( C ) SntANDEONESS: «h^c 
( O )TOPOUXiY. Imar 

( I 1 ) MOLECULE TYTEiDNA (probe) 

( X I ) SEQUENCE OESCSITnON: SEQ ID NOJJl: 

AAAAAACACT CTAAAATCA 



( 3 ) tNFORMAnON FOR SEQ ID SO-M 

( I ) SEQUENCE CKARACTERmiCS: 
( A ) LENCIK- 35 bue pain 
( B )TYP£: Bodelc tdd 
( C ) STUANDEDNESS: sh^e 
( D)TOP0L0CY:liaar 

( 1 1 )MOLECUUTY7£:DNA(oUgoaucleoixte) 

( X 1 ) SEQUENCE DESCSOTION: SEQ ID NO-J32: 

CATTAAACAA AATAACATCA TTGGTOTTTC CTATG 



( 3 ) INFORMATION FOR SEQ ID hX>J3h 

( i ) SEQUENCE CKARACIERISnCS: 
( A ) IENC7XK: 6^ bue pair) 
( B )TYF£:aadaeieiil 
( C ) STRANDEONESS: ftm^le 
( D)TO?0L0C1£Im«v 

( 1 t ) MOLECULE TYPE: DXACoOgoatideotlile) 



( M 1 ) SEQUENCE DESCBI7TT0N: SEQ ID NO-J33: 



AACAAACCTA 


CCCACCCTTA 


ACACTACATA 


GTACATAAAC 


CCATTTACCG 


TACATACCAC 


6 0 


ATTACACTCA 


AATCCCTTCT 


CGTCCCCATG 


GATGACCCCC 


CTCAGATAGG 


GGTCCCTTGA 


12 0 


CCACCATCCT 


CCCTC.AAATC 


AATATCCCCC 


ACAAGACTGC 


TACTCTCCTC 


CCTCCGOGCC 


I sa 


CATAACACTT 


GCCCGTACCT 


AAAGTCAACT 


GTATCCGACA 


TCTCOTTCCT 


ACTTCACOOT 


2 4 0 


CATAAACCCT 


AAATAGCCCA 


CACGTTCCCC 


TTAAATAAGA 


CATCACOATG 


CATCACAGCT 


300 


CTATCACCCT 


ATTAACCACT 


CACGGGAGCT 


CTCCATGCAT 


TTGGTATTTT 


CGTCTGCCGG 


3 6 0 


CTATOCACGC 


GATAGCATTG 


CGACACGCTG 


CAGCCCGACC 


ACCCTATGTC 


OCAGTATCTG 


^ 420 


TCTTTCATTC 


CTCCCTCATC 


CTATTATTTA 


TCGCACCTAC 


CTTCAATATT 


ACACGCCAAC 


4 8 0 


ATACTTACTA 


AAGTGTOTTA 


ATTAATTAAT 


GCTTGTAGCA 


CATAATAATA 


ACAATTCAAT 


5 4 0 


CTCTCCACAO 


CCACTTTCCA 


CACACACATC 


ATAACAAAAA 


ATTTCCACCA 


AACCCCCCCT 


6 00 


CTCCCCCCCT 


TCTGCCCACA 


CCACTTAAAC 


ACATCTCTGC 


CAAACCCC 




6 4 8 



1 9 



3 5 



( 2 )INF0R.VCAn0KF0RSSQIDNa33^ 
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( I ) SEQUENCE CKABACTERtSnCS: 
( A ) LENCTK: 12 base p*in 
( B)TYP£:Baddeactd 
. . ( C)5TRAM)£I>NESS:sh^e . 
, ( O)T0roLOCY.iiaaj - 

(I r )MOl£CUl£TY?E: D.VA (probe) . 

( X 1 ) SEQUENCE DESCRimON: SEQ tD NO*JX: 

OATCCTOAOG AC 

( 2 ) INFORMAnON FOR SEQ ID N0J3 J: 

( I )SEOUEXCECHARACIERISnCS: 
( A ) LENGTH: U buc pain 
( B ) TYFE: Bodde add 
( C ) STKAM>EDNESS: sfa^e 
( D)TOPai.OC% I'meir 

( t I ) MOLECULE TYPE: DXA((robe) 

( X I ) SEQUENCE DESOUFHON: SEQ CD NOJ35: 

CTCCTCCCCO CT 



t 2 



( 2 ) DfFORMAnOK FOR SEQ ID K0-JJ6: 

( I )5EQU&'CECKAIUCTEIUSnC5: 
( A)LENC7TH:Ubi5c pain 
( B}TYPE:oaeUk«eid 
( C ) STRANDEONESS: «b£le 
( D ) TOPOLOGY liwar 

(I t )M0l£CUl£7YFE:DXA(prob«). 

( X 1 ) SEQUENCE DESCRIFnON: SEQ ID HO-Mt 

ACTCCTCCCC CC 



1 2 



( 2 ) DfFORAtAnON FOR SEQ ID N0037: 

( I ) SEQUENCE CKARACTERtSnCS: 
( A ) LENGTH: 13 buepain 
( B)TYPE:Boclcie»c!d 
( C ) STHANDEDNESS: «m^c 
( D)T0P0L0CY:lmar 

( 1 ] )MOl£a;i£TYF£:DXA(frobe) 

( X I } SEQUENCE DESOUTTION: SEQ CD NaJ37: 

CACTCCTCCC CO 



1 2 



( 2 ) INFORMAnON FOR SEQ ID NO-JJ& 

( I ) SEQUENCE CKARACIEMSnCS: 
. ( A)LENCIH:Uba>cp*!n 
( B)TYPE:oKfdeacSd 
( C ) SnUNDEDNESS: tbi^e 
<D)T0roLOGYlIbe» 

( I 1 ) MOLECULE TYPE: DXA (probe) 

( X I ) SEQUENCE DESCRUTiaV: SEQ ID NO-J3S: 

CGACTCCTCC CC 



1 2 



( 2 ) D^RMAnON FOR SEQ ID N0J}9: 

( t ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH: 12 bnc ptin 
( 6 )TYPE:BiiddeaeU 
( C )STM>fDEDNE5S: sbgle 
(D)TOFOLOCYitha/ 



\ 
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( 1 I ) MOLECULE TYTE: DXA (probe) 

( X I ) SEQUENCE DESOttPTION; SEQ DD NO-J39: 

ACipACTCCTC cc •. />- ;::. . 

( 2 ) ENTORMAnON FOR SEQ [D HOMO: 

( I ) SEQUENCE CHARACTEROTTCS: 
( A ) LENGTH: U base pain 
( B)'rr7E:aBdcte»e{d 
( C ) STRANDEDNHSS: «h jl« 
(D)TOP0LOC\: Unar 

( I I )MOl£CUl£TYPE:D.VA(pK*c) 

( X t ) SEQUENCE OESCRIfTION: SEQ ED N0-J4Q: 

TACCACTCCT CC 



( 3 ) INFORMATION FOR SEQ U> ItO-Mli 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENCTK 13 btMpaln 
( B )TYFE:codek*cId 
( C )5TRAVDE0NESS: sb^te 
( D)TOFOLOCY^tiBax 

( I t ) MOLECULE TYPE: DNACpobe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID N0-J41: 

. CTACCAC.TCC TC 



1 3 



( 3 ) INFORMAnON FOR SEQ © NO-^i 

( I ) SEQUENCE CKARACIERISnCS: 
( A)L£NaiK:L3bxsepitn 
( B)TY7E:Baclc!eicU 
( C ) STRANDEDNESS: ib^e 
(D)TOPOLOCYitia£v 

( i i ) MOLECULE TY7E:DXA(frobO 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO-J4:2: 

TCTACCACTC CT 



1 3 



( 3 ) INFORMAnON FOR SEQ ID NO-JU: 

( I ) SEQUENCE CKARACTEIUSnCS: 
( A)LEKaTK:Ubu«f3lri 
( B)TY7£:aKlch:tcI(l 
( C ) STKANDEDNESS: abgl* 
(D)TOPOLOCYiliBeir 

( I i ) MOLECULE TYPE: DNACpobe) 

( X I ) SEQUENCE DESCSIPTTON: SEQ CD N0-J4J: 

TTCTACCACT CC 



1 3 



( 2 )INFORMA^ONF0RSEQa)^0aU: 
( I ) SEQUENCE CKARACTERISnCS: 
( A ) UENCTK: 13 btM pain 
( B)TY7E:sDdck*c{il 
( C ) STTIANDEDNESS: tbgte 
< D}T0P0L0C%[fflor 

( I I ) MOLECULE TYPE: OXA (probe) 

( X I ) SEQUENCE DESCWniON: SEQ ID NO-J44: 
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ATTCTACGAC TC 



. . ( 2 ) DiFORMAnOKFOR SEQ ID UOM5: 

X r) SEQUENCTCHAlUCTER[StlCS: 

( A ) lESCm il b«j« pain * ' " 

( B)TYVE:oiicIae»cU 

( C ) STKANDEONESS: th^e 

( D)T0I>OLOCYi linear 

( I I ) MOLECULE TYPE: D.VACf»ob<) 

( « I ) SEOUEXCE DESCRIPTtON: SEO ID 4VO-J45: 

TATTCTACGA CT 



( 2 ) [NFORMAnOX FOR SEQ (D NQ-^ 

( I ) SEQUENCE CKARACIERCSnCS: 
( A ) lENGTK: Ubuepaln 
( B)TYP£:aacIcIcadd 
( C ) STKANDEDNESS: «uig(e 
(D )TOPOLOCY Udot 

( I I ) MOLECULE TYPE: D.VA(pob«) 

( X I ) SEQUENCE DESCRITTION*: SEQ ID N0-J4tf: 

CTATTCTACC AC 



( 2 ) INFORMAnON FOR SEQ U> HOMli 

( I ) SEQUENCE CKARACTERISnCS: . 
/ ( A ) LENGTH: 13 buc pain 
( B )TYPE: oodck acid 
( C ) STIUnDEDNESS: ils^f e 
( D)TOFOLOC\^ Ibear 

( I I ) MOLECULE TYPE: DXAtprob*) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID HO-Ml: 

CCTATTCTAC CA 



< 2 ) INFORMAnON FOR SEQ ID N0a4& 

< I ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: 10 base pain 
( B) TYPE: aodcte Kid 
( C ) SnUNDEONESS: sbgle 
( D)TOPOLOCY:llBear 

( I I ) MOLECULE TYPE: DNACprobf) 

( . X t . ) SEQUENCE DESCRIPTION: SEQ ID N0-J4S: 

TCCTCCCCCG : 



( 3 ) INFOR.\CAnON FOR SEQ ID NO-^9-. 

( I ) SEQUENCE CHARACTERLSnCS: 
( A ) LENGTH: 10 biM pain 
( B)TYPE:Bod<Ic add 
( C ) STRANDEDNESS: thgts 
(D)T0P0L0GYiUa*a 

( I I ) MOLECULE TYPE: WfA(Fcb«) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-J49: 

CTCCTCCCCC 10 



( 2 ) INFORMAnON FOR SEQ CD N0-J5Ct 
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( I ) SEQUENCE CKARACTEUSnCS: 
(A)LEXCTH:lObu« pt'tn 
( B )TYPL*Bode{c*dd 
( C )SntAM>EDNESS:s*m£le • 
. . ( 0 ) TOPOLOGY: liaeir - . 

: - ' • ( I i )MOt£ari£TYPE DXA(fro6«) ' 

( X I ) SEQUENCE DESCRfPTXON: SEQ ID NO-J50: 

ACTCCTCCCC 



1 0 



( 2 ) INK>R.VCAnOW FOR SEO ID N0-J51: 

( I ) SEQUENCE CKMUCTERISnCS: 
{ A ) LENGTH: 10 bue pain 
( B)TY?£:BodeIe»cId 
( C ) STKANDEONESS: liable 
(D)TOP0LOCY:riBeiT 

( I I )MOl£CUl£TYPE: DNACprobc) 

• ( K I ) SEQUENCE DESCKIPTTON: SEQ tD N0J31: 

CACTCCTCCC . . ^ ^ 

( 2 ) DiTOIWAnON FOR SEO D) NO-J32r 

. ( I ) SEQUENCE CHARACTERISnCS: 
( A)L£NCTH:tObiMpain 
( B }TYF£:aoet<{c»cb] 
( C ) SnUNDEDNESS: tm^e 
(D)TOPOLOCY:GB«r 

. ( l.i )M6t£Cin£TYPE:aVA'(frobe)*. ■ ■/ 

( X I ) SEQUENCT DESCRIPTTON: SHQ lb N6-J52: 

COACTCCTCC JO 

( 2 ) ENFOR.VMnON FOR SEQ © N0-J5J: 

( I ) SEQUENCE CKARACTERISTICS: 
( A)LE>'CTK:10bucpaIn 
( B)-nrPE:eadc{cKid 
( C ) STRANDEDNESS: sb^le 
(D)TOF0L0CY:[iBar 

( 1 I )MOLEaJI£TYPE:aVACfEob«) 

( X I ) SEQUENCE DESOUPTTON: SEQ ID NO-J53: 

ACCACTCCTC 10 



( 2 ) INFORMAnOff FOR SEQ D NOJK* 

.( I )SEQUENCECkARACTERrsnCS: • 

( A)L£NCTH:10buepii» . 

< B)TYFE:iudcIe>dil 

( C ) SnUNDEDNESS: ih^e 

(0)TOPOLOCY:*tia« 

( I 1 ) MOLECULE TYPE: DNA(pco6€) • 

( X I ) SEQUENCE DESCRI7TE0N: SEQ ID NO-J54: 

TACCACTCCT 10 



( 2 ) INFORACAnON FOR SEQ ID NOOSS: 

( I ) SEQUENCE CKARACTERISnCS: 
(A)LE>'CnK:lOb<M|nIn 
( B)TYP£:BadcIe*cU 
( C ) SmtANQEDNESS: sbgle 
(D)T0FOLOCYiGB<tf 
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( I I )MOLEa;i£TVPE:D.VA(frol>0 
( X i ) SEQUENCE DESCROTIOX: S£Q ID h'0-JS5: 
. ' CTACCACTCC • * • ; '" ' 

(2) WFORACAnON FOR SEQ ID ttO-JSSi 

( 1 ) SEQLfENCE OURACTEWynCS: 
( A ) LENGTH: 10 biM patn 
( B )TYP£:oodc!cftcU 
( C ) STKANDEONESS: th^e 
( D)TOP0L0CY:Uae3r 

( I I )MOl£a;t£TYPE:aVA(pro6«) 

( « I ) SEQUENCE DESOUPTION: SEQ ID N0-JS6: 

TCTACOACTC 

( 2 ) INFORMAnON FOR SEQ ID NOJ37: 

( I ) SEQUENCE CHARACTERISnCS: 
(A)tEN<7rH: 10 biM pain 
( B )TYP£:Bvl«Ie»eId 
( C ) CTRANDEDNESS: tingle 
(0)T0POLOGV:&iea/ 

( I I ) MOLECULE TYPE: DNA (probe) 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO-J57: 

TTCTACCACT 



: I 0 . 



1 0 



(2) INFORMAnON FOR SEQ ID N0-J5& 

( I ) SEQUENCE CHARACIERtSnCS: 
( A ) LTNOTK: 10 bas4 pain 
< B ) TYPE: ottelele add 
( C } STRANDEDNESS: sin^e 
( D ) TOPOLOGY: Vneu 

( I i ) MOLECULE TYPE: DNACpobe) 

( » I ) SEQUENCE DESCRIPTION: SEQ ID N0JS8: 

ATTCTACGAC 



t 0 



( 2 ) [NFORMAnONFOR SEQ ID N0-JS9: 

( I ) SEQUENCE CKARACTERISnCS: 
( A ) LENGTH; lObiMpaln 
( B)TY7E:BQclc{e»eId 
( C ) STRANDEDNESS: ab^e 
(D)TOPOLOCY:DBe« 

( I I ) MOLECULE TYPE: OXACFobe) 

( X t ) SEQUENCE DESCRIPTION: SEQ ID NO-J59: 

TATTCTACGA 



( 2 ) INFORMAnON FOR SEQ ID K0-J6a 

( I } SEQUENCE CKARACIBUSnCS: 
( A } LENGTH: 1&4 tax pin 
( B }TY7E: aadtie idd 
( C } STRANDEDNESS: alujle 
(0)TOPOLOGY:tiaof 

( I I ) MOLECULE TYPE: DNA(ol%ooaeIeolI<fc) 



( X I ) SEQUENCE DESCRIPTION; SEQ IDNO-JdQ: 
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•coottoued 



TACTCCCCTO CCCTCAACAA OATCTTTTCC CAACTOCCCA ACACCTOCCC TCTGCAOCWp «o 

KCOOWWCATT CCACACCCCC OCCCCCCACC. CCCOTCCCCO CCATOOCC.AT CTACAAOCAO ,,o 

TCACACCACA TPACCOACCW WOKO 

SAYo''--' ■• ' '■ •■; ■ • •. ■•■ • . ■ 



1.8 0 



YaH f ,• . . 3'-TAGTAGXAACCACAA(SEQID. N0■13V 
1. An array of oligODUcIeotide probes immobaized oa a 3'-AGtAGAXACCACAAA rSEO ID isTo idV 
solid supppri. said array baviog at least 100 probes and oo 3'-GTAGAAXCCACAAAG (slo m NO i 
more than 100.000 different oUgonudeoU'de probes 9 to 20 I'-TAGAAAXCAC^S Slo m Kn S . 
nucleot,desinIengthocc«pyingseparatekno,Jrositesins.id „ l-AGA^SSSO^^^^^ u • 
array, said oligonucleotide probes comprising at least four ZX\^Z!T -a t ^^^9 JP' 
sets of probes: (1) a first sei that is exactly complementary ctfrTi'^T r^"^ ^ ^^'^i^^'"*''/ A. G. 
to a reference sequence and comprises probes that com- T ^ 'f^ , . 

pletely span the reference sequence and. relative to the • ^- ""^^^ "Jf*^'"™ l'^'ie«"J "id reference sequence 

reference sequence, overlap one another in sequence; and (2) nw/*^"'"** of a D-Ioop region of human mitochondrial 

three additional sets ofprobes, each of which is identical to ,„ 

said first set of probes but for at least one different . l". The array .of claim 9, wherein said probes are 15 

nucleotide, which different nucleotide is located in the same nuc'eoU'des in length, and said array comprises a first set of 

position in each of the three additional sets but which is a probes exactly complementary to a sequence contained in a 
different nucleotide in each seL ^ sequence bounded by positions 16280 to 356 of the rsfer- 

2. The array of claim 1. further comprising a fourth ence sequence and four additional sets of probes identical to 

additional set of probes, which fourth additional set is saidfirstsetbutforposition7,relativetoa3'-endofaprobc 

identKal to probes in the first set. which 3'-end is covalendy attached to the substrate, where.' 

• A^i °^ wherein said reference sequence for each of the four addiuonal probe sets, a different nucle- 
is a double-stranded nucleic acid and probes complementary jq otide is located, such that, for each probe in said first set 

to t«th strands of said reference are in said array «here is an identical probe.in one of the four additional sets' 

nuJ,Sderfe',S"°''"^^ :t^:to^deprobr' ''^>''' 

„..fi'3-? "^'f P"''^ of '='aira 1. wherein said reference sequence 

nucleotides m length and attached by a covalent linkage to 35 is a sequence from an exon of a human p53 gene^ 
aateona3-endofsaidprobes,andsaiddifferentnucleotide 12. He array of claim 11 wherei^ said reference 
'''^'^^'"l^^'^l-'^'^^^^ sequence comprL at least "'dOnurotfde^^g^^ 

• c ^^t^'^ reference sequence sequence from exon 6 of a p53 gene 

fnSTm^ , '"^y °f H. wh«"n said reference 

^^^•^''^Sonuclcotxdc probe 10 to 18 nucleotides in « sequence is exon 5 of a p53 gene, said probes a^ 17 

7 Ti,.,„,„„t , • , . . .■ nucleotides long, and said array comprises a first set of 

nf n'rnhL Z ' • • ' ^''"^ '^"^ ""^^ compnscs a set probes exactly complementary to said sequence and at least 
£ th^.™uS'!f;^*'^''^""''-''^'f "^'^ "^"^ °f "^l- setwmprising probes 

3"™^AG «lo°m NO^,^T' '■'."'•''^ ^ « nucleotide at p<liL 7. 

I^TtSSSgaS m '^^^TJ° °^ ' P™*^' 3'-end is covalently 

J-]^G?2SiiiSS:KSf3S; anucl.»udeat.his;x«itioninaco.respondingp 

sSoScS^^glSS K'i M. -me array.ofdaim l. wherein said probes are 61i. 

i.^?A^?VCw5S^,?f*' 50 godeoxyribonudeoddes. 

3'-?aSa^^SJ fpSS'v?^^' . IS-THearrayofdaimLwhereinsaidarrayhasbetween 
rArA^^^^^!.!St'^-^°"^'^'"'' 10.000 and 100,000 probes. 

'^J^^^u ^^^S 1°: ^^°=^^°>' «"»y of ^l""" 1. wl"""" reference sequence 

• comprises 4 probes, and X is individually A, G, C, and T is from a human immunodefidency vims. 

/lllar^vorM,! < u • , " . ofclaim 16, wherein the reference Sequence 

eo«i«te f"^ • ^""P °^ ff°° a vetse transcriptase gene of the human immuno- 

L ■ ■ deficiency virus. 

V TA^^^^^n^ ^- ^'PPort » linker. 

3 -TATAGTXGAAACCAC (SEQ id. N0:11); 

3'-ArAGTAXAAACCACA(SEQID.NO:l2); ♦ . , , , 
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[57] ABSTRACT 

Oligonucleotide analogue arrays attached to solid substrates 
and methods related to the use thereof are provided. The 
oligonucleotide analogues hybridize to nucleic acids with 
either higher or lower specificity than corresponding 
unmodified oligonucleotides. Target nucleic acids which 
comprise nucleotide analogues are bound to oligonucleotide 
and oligonucleotide analogue arrays. 
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ADDAVs nv MnniPTT-n NUCLEIC ACID density of more than 100 members at known locations per 

^ iSSot USE cm^ or more preferably, more than 1000 members per cm^ 

PROBES AND METHODS OF U5L ^ embodiments, the arrays have a density of more 

CROSS-REFERENCE TO RELATED than 10.000 members per cm*. 

APPLICAIION S The solid substrate upon which the array is constructed 

includes any material upon which oligonucleotide analogues 

This appUcaiion is a cbntinuation-m-part of U.S. Ser. No. juached in a defined relationship to one another, such as 

08/440,742 filed May 10. 1995 abandoned, which is a ^ ^jj^g^ Especially preferred oligonucle- 

continuation-in-part of PCT application (designating the ^^^^ 'analogues of the array are between about 5 and about 

United States) SN PCT/US94/1230S filed Oct. 26. 1954. jO nucleotides, nucleotide analogues or a mixture thereof in 
which is a continuation-in-part of U.S. Ser. No. 08/284,064 

filed Aug. 2, 1994 ab'indoned wto^^^ a continuation^m- ■ embodiments, nucleoside analogues 

part of U.S. Ser. No 08/143,312 filed Oct. 26 1993 °° Jj^ P oligonucleotide analogues of the array 

abandoned, each of which is incorporated herem by refer- ^^^^^ 

ence in its entirety for all purposes. 15 

FIELD OF THE INVENTION 




The present invention provides probes comprised of 
nucleotide analogues immobilized in arrays on solid sub- 
strates for analyzing molecular interactions of biological 20 
interest, and target nucleic acids comprised of nucleotide 

alkylthio. halogen (Fluorine, Chlorine, and Bromine). 

BACKGROUND OF THE INVENTION cyano, and azido. and wherein Y is a heterocyclic moiety, 

e g , a base selected from the group consisting of purmes, 
Hie development of very large scale mimobilized poly- analogues, pyrimidines, pyrimidine analogues, um- 
mer synthesis (VLSIPS™) technology provides pioneenog ^ ^^^^^ ^^^ ^ 5-nitroindole) or other groups or ring 
methods for arranging large numbers of oligonucleotide ^y^^^j^ capable of forming one or more hydrogen bonds 
probes in very small arrays. See, U.S. applicationSer. No. corresponding moieties on alternate strands within a 
07/805,727 now U.S. Pat. No. 5.424,186 and PCT patent ^^^i^, triple-stranded nucleic acid or nucleic acid 
publication Nos. WO 90/15070 and 92/10092. each of wluch ^^^^^^^^ ^ther groups or ring systems capable of forming 
is incorporated herein by reference for aU purposes. U.S. jj^arest-neighbor base-stacking interacdons within a double- 
patent appUcation Ser. No. 08/082.937, filed Jun. 25, 1993, triple-stranded complex. In other embodiments, the oli- 
and incorporated herein for all purposes, describes methods ^cleotide analogues are not constructed from 
for making arrays of oUgonucleotide probes that are used, nucleosides, but are capable of binding to nucleic acids in 
e.g., to determine the complete sequence of a target nucleic ^j^joj, jue to structural similarities between the oligo- 
acid and/or to detect the presence of a nucleic acid with a ^ n^j^leotide analogue and a naturally occurring nucleic acid, 
specified sequence. ^ example of such an oligonucleotide analogue is a peptide 
VLSIPS'f" technology provides an efficient means for nucleic acid or polyamide nucleic acid in which bases which 
large scale production of miniaturized oligonucleotide hydrogen bond to a nucleic acid are attached to a polyamide 
arrays for sequencing by hybridization (SBH), diagnostic backbone. 

testing for inherited or somatically acquired genetic present invention also provides target nucleic acids 
diseases, and forensic analysis. Other applications include hybridized to oligonucleotide arrays. In the target nucleic 
determination of sequence specificity of nucleic acids, acids of the invention, nucleotide analogues are incorporated 
protein-nucleic acid complexes and other polymer-polymer ^^^^ nucleic acid, altering the hybridization prop- 
interactions, erties of the target nucleic acid to an anay of oligonucleotide 

50 'orobes. Typically, the oligonucleotide probe arrays also 

SUMMARY OF THE INVENTION comprise nucleotide analogues. 

-Tlie present invention provides arrays of oligonucleotide IHe target nucleic acids are typically synthesized by 

analogues attached to soUd substrates. OUgonucleotide ana- providing a nucleotide analogue as 

logu J have different hybridization properties than oUgo- enzymatic copying of a nucleic acid. For 

Stides based upon naturaUy occurring nucleotides. By 55 otide analogues are incorporated »^to Polyn^^^^^^^ aad^a- 

incorporating oligonucleotide analogues into the arrays of logues using Uq po ymerase m a ^^^^.^V^^^ 

the bvenUon hybridization to a target nucleic acid is nucleic acid contammg a sequence to be analyzed is typi- 

nntir^^fd DyonQiza 6 amplified in a PCR or RNA amplification procedure 

number or variety of compounds to be ^^ against the anajcge array ^^^^^^ 

members. Inothergroupsof embodiments, thearrayshave r''"^,*"'^'!^"^'^^ advantage of 
between 100 and 10 000 members, and m^^^^^^^ « jSg't L^^^Tar^^t ^leic acid with gLter 

^emtS^nT^c'fer!^em^bS^^^^^^^^ LgevUyV -derilig it res.tant to enzymatic degradation. 
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For example, analogues comprising T-O- optionally derived from namral sources, but is often sy^^ 

mcthyloligoribonuclcotides are resisunt to RNAase A. ihcsizcd chemically. It is of any size. An ohgonucleotide 

Oligonucleotide analogue arrays are optionally arranged analogue" refers to a polymer with two or more mononaenc 
into libraries for screening compounds for desired subunits, wherein the subumls have some strucmral features 
characteristics, such as the ability to bind a specified oligo- 5 common with a naturally occumng oligonucleotide which 
nucleotide analogue, or oligonucleotide analogue- allow it to hybridize with a naturally occurrmg oligonucle- 
containing structure. The h*braries also include oUgonucIe- ptide in solution. For instance, structural groups are option- 
otide analogue members which form conformationally- aUy added to the ribose or base of a nucleoside for moor- 
restricted probes, such as unimolecular double-stranded poration into an oligonucleotide, such as a methyl or allyl 
probes or unimolecular double-stranded probes which group at the 2'-0 position on the ribose, or a fluoro group 
present a third chemical structure of interest. For instance, which substitutes for the 2'-0 group, or a bromo group on 
the array of oligonucleotide analogues optionally include a the ribonucleoside base. The phosphodiester linkage, or 
plurality of different members, each member having the "sugar-phosphate backbone" of the oligonucleotide ana- 
formula: Y— L^— X^— L^— X^. wherein Y is a solid j^g^^ substituted or modified, for instance with methyl 
substrate, X^ and X^ are complementary oligonucleotides phosphonaies or 0-methyl phosphates. Another example of 
containing at least one nucleotide analogue. is a spacer. ^ oligonucleotide analogue for purposes of this disclosure 
and is a linking group having sufiBcient length such that includes "peptide nucleic acids" in which native or modified 
X^ and X^ form a double -stranded oligonucleotide. An array nucleic acid bases are attached to a polyamide backbone, 
of such members comprise a library of unimolecular double- Oligonucleotide analogues optionally comprise a mixture of 
stranded ohgonucleotide analogues. In another embodiment. naturally occurring nucleotides and nucleotide analogues, 
the members of the array of oligonucleotide are arranged to However, an oligonucleotide which is made entirely of 
present a moiety of interest within the oligonucleotide naturally occurring nucleotides (i.e., those comprising DNA 
analogue probes of the array. For instance, the arrays are RNA). with the exception of a protecting group on the end 
optionally conformationally restricted, having the formula oligonucleotide, such as a protecting group used 
— X"— 2^X^^ wherein X" and X^^ are complementary during standard nucleic acid synthesis is not considered an 
oligonucleotides or oligonucleotide analogues and Z is a ^5 QjjgQjjmj^gQticje analogue for purposes of this invention, 
chemical structure comprising the binding site of interest. ^ "nucleoside" is a pentose glycoside in which the 

Oligonucleotide analogue arrays are synthesized on a ggiycone is a heterocyclic base; upon the addition of a 

solid substrate by a variety of methods, including light- phosphate group the compound becomes a nucleotide. The 

directed chemical coupling, and selectively flowing syn- ^^^^^ biological nucleosides are p-glycoside derivatives of 

thetic reagents over portions of the solid substrate. The solid p-ribose or D-2-deoxyribose. Nucleotides are phosphate 

substrate is prepared for synthesis or attachment of oligo- ^^^^^^ nucleosides which are generally acidic in solution 

nucleotides by treatment with suitable reagents. For hydroxy groups on the phosphate. The nucleo- 

example, glass is prepared by treatment with silane reagents. ^.^^ connected together via phos- 

The present invention provides methods for determinmg ^^^^^ attached to the 3' position of one pentose and the 

whether a molecule of interest binds members of the oligo- ^, p^gj^-Qn j^e next pentose. Nucleotide analogues and/or 

nucleotide analogue array. For instance, in one embodiment, nucleoside analogues are molecules with structural similari- 

a target molecule is hybridized to the array and the resulting naturally occurring nucleotides or nucleosides as 

hybridization pattern is determined. The target molecule discussed above in the context of oligonucleotide analogues, 

includes genomic DNA, cDNA, unspliced RNA, mRNA^ ^ "nucleic acid reagent" utiUzed in standard automated 

and rRNA, nucleic acid analogues, proteins and chemical ^jj^^leotide synthesis typicaUy caries a protected phos- 

polymers. The target molecules are optionaUy amplified ^^^^ ^, hy^ox^X of the ribose. Thus, nucleic acid 

prior to being hybridized to the array, e.g., by PGR, LCK, or ^^^^^^ referred to as nucleotides, nucleotide reagents, 

cloning methods. nucleoside reagents, nucleoside phosphates, nucleoside-3'- 

m oligonucleotide analogue members of the array used hosphatcs. nucleoside phosphoramidites, 
in the above methods are synthesized by any described ^^osphoramidites, nucleoside phosphonates, phosphonates 
method for creating arrays. In one embodunent, the oUgo- generally understood that nucleotide 
nucleotide analogue members are attached to the solid ^^ ^^^ ^arry a reactive, or activatible. phosphoryl or 
substrate, or synthesized on the sohd substrate by light- ^onyl moiety in order to form a phosphodiester link- 
directed very large scale immobilized polymer synthesis, age 

e.g., using photo-removable protecting g^^P^.^^^^^g^^y^^ '° "protecting group" as used herein, refers to any of the 

thesis. In anotherembodmient, the oUgonucieotiae memoers r . - . "designed to block one reactive site in a 

are attached to the solid substrate by ^''^^i^f^'^^^l Sle whUe a chenlcal reaction is carried out at another 

channel adjacent to the surface of said s^^^U^* Pl^^^^^^ ^^ctive site. More particularly, the protecting groups used 

selected monomers in said channels to synthesize ohgo- ^^^^^ 

nucleotide analogues at predetermined poruons of selected ""^^"^ ^ ^p^J^ve Groups In Organic Chenust^^ 2nd 

regions, wherein the portion of the selected regions com- ^ ' ^ york, NY. 1991, which is 

prise oligonucleotide analogues different from ohgonuck- ^ ^ ^y reference. The proper selection of 

otide analogues m at least one other of the selected re^ons ^^^^ ^ ^^^^^ ^ ^ ^^^^^ by 

and repeating the steps with the channels formed riong a P^^ ^^^^kxIs employed in the synthesis. For example, 
second portion of the selected regions. TTie»hd subs^te « «> l\'^^^^^^„ f^^^^^ ^isc^^ b^rein, the protect- 

any suitable matenal as '^^^r^^^.^^Zll^^^mtf fag g^ups are photolabae protecting groups such as NVOC. 

shdes. and arrays, each of which is constructed from, e.g., ^eNPocV and those disclosed in co-pending Application 

silica, polymers and glass. PCTAJS93/10162 (filed Oct. 22, 1993), incorporated herein 
DEFINITIONS 65 by reference. In other methods, protecting groups are 

An "OUgonucleolide" is a nucleic acid sequence com- removed by chemical methods and include groups such as 

posed of two or more nucleotides. An oUgonudeotide is FMOC. DMT and others known to those of skill m the art. 
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A "puriac" is a generic tcnn based upon the specific synthesizing oligonucleolides and ^f.^* 

compound "purine" having a skeletal structure derived from logues are found m. for exampU, fjf ^^^^^^ 

the feion of a pyrimidine ring and an imidazole ring. It is A PracUcal Approach Gait, ^f '^.f^^^'^^'f^^^ 

generally, and herein, used to describe a generic class of H A^Kmjpers A^uc/.,c Aads ^^--J l^JJ' ^197 

impounds which have an atom or a group of atoms added 5 (1994); IC L. ^ueholm 7 Or^^Cft^m^ r^ZIX 

to th^parent purine compound, such as the bases found in (1994). and S. Agrawal ed.) A/e//zo^ '^^tr.f/hvS' 

the naturally occurring nucleic acids adenine volume 20. each of which is mcorporated herem by ^^^^^^^ 

(6.aminopurine)andguanine(2.amino.6-oxopurine),orless ence in its entirety ^^^/^ P^^*^' ^X^^h^^^^^ 

^mmonly occurring molecules such as a-amino-adenine, lecular double-stranded DNA.^^^.^^^j^^" ^^^^^ 

N«-methyladcnine. or 2.methylguanine. lO descnbed. See, copendmg application Scr. No. 08/327.687. 

r* mcmyiducumc. 5,556.752 which is incorporated herem 

A "purine analogue" has a heterocychc nng with struc- ^ " ■'^^ » 

mral simUarities to a purine, in which an atom or group of tor aU purposes. ^ .,,,„c «f 

atoms is substituted for an atom in the purine ring. For Improved methods of forming /Jf^^^^^^^ 

instance in one embodiment, one or more N atoms of the oUgonucleotides. pepUdes and other polymer sequences 

Se h^ tero^^^^^^ ring are replaced by C atoms, with a minimal number of synthetic steps are known. Se^ 

purine ncicruuycuu I ^ / . , ^ PirruHE et al.. U.S. Pal. No. 5.143,854 (see also, PCT 

A "pyrimidine" is a compound with a specific heterocy- ^^^^^^f^^Vo WO 90/15070) and Fodor et al.. PCT 

cUcdiazineringstmcture,butisusedgenericaUybyper^ Apf^^-^^^^ No WO ^ ) ^^^^ 

of skill and herem to refer to any <«^P°^d hav^^^^^ ^'i^n by reference, which disclose methods of forming vast 

U^iazine nng with minor additions, such as the common V oligonucleotides and other molecules 

nucleic acid bases cytosine /liy°^.>°^' ^^J^^ for example Ughl-directed synthesis techniques. See 

5-methylcytosine and 5.hydroxymethylcytosme. or the non- ^^'^^^^^^^ (1991) Science; 251. 767-77 which is 

naturaUy occurring 5-bromo-uraal. incorporated herein by reference for aU purposes. These 

A "pyrimidine analogue" is a compound with structural procedures for synthesis of polymer arrays are now referred 

similarity to a pyrimidine, in which one or more atom in the ^5 ^ VLSIPS™ procedures. 

pyrimidine ring is substituted. Fo^J^^s|l°«' ^sing the VLSI?- approach, one heterogenous array of 

embodmient, one or more of the N atoms of the rmg are ^ converted, (hrough simultaneous coupling at a 

substituted with C atoms. ^^^^^^ ^^^^^.^^ ^.^^^ ^ different heterogenous array. 

A "solid substrate" has fixed organizational support appUcation Ser. No. 07/796,243 now U.S. Pat. No. 

matrix, such as silica, polymeric materials, or glass. In some 30 ^^^^^^ ^nd U.S. application Ser. No. 07/980.523 now 

embodiments, at least one surface of the substrate is partially ^ g ^ ^^^^ disclosures of which are incor- 

planar. In other embodiments it is desirable to physically j^^^j^ '^^^ ^jj purposes. . 

separate regions of the substrate to delmeate synthetic development of VLSIPS™ technology as described 

regions, for example with trenches, grooves ^^Us or the above-noted U.S. Pat. No. 5,143,854 and PCT patent 

like. Example of solid substrates include shdes. beads and 35 p^^^^^^j.^^^ j^^s. WO 90/15070 and 92/10092 is considered 

arrays. pioneering technology in the fields of combinatorial synthe- 

DESCRIPnON OF THE DRAWINGS sis and screening of combinatorial ^^'^"^^^^^^^^^ 

patent apphcaUon Ser, No, 08/082,937, filed Jun, 25, lyyj 

no, 1 shows four panels (FIG. lA, FIG. IB. FIG. IC and (incorporated herein by reference), describes methods for 

FIG. ID). FIGS, lA and IB graphically display the differ- 40 ^^^^ ^^^yg of oligonucleotide probes that are used to 

ence in fluorescence intensity between the matched and check or determine a partial or complete sequence of a Urget 

mismatched DNA probes. FIGS. IC and ID illustrate the nucleic acid and to detect the presence of a nucleic acid 

difference in fluorescence intensity verses location on an containing a specific oligonucleotide sequence, 

example chip for DNA and RNA targets, respectively. Combinatorial Synthesis of Oligonucleotide Arrays 

FIG. 2 is a graphic illustration of specific light-directed VLSiPS*"* technology provides for the combinatorial 

chemicalcouplingofoligonucleotide analogue monomers to synthesis of oligonucleotide arrays. The combinatorial 

an array. VLSIPS*™ strategy allows for the synthesis of arrays con- 

FIG. 3 shows the relative efficiency and specificity of taining a large number of related probes using a minimal 

hybridization for immobilized probe arrays containing number of synthetic steps. For instance, it is possible to 

adenine versus probe arrays containing 2,6-diaminopurine synthesize and attach all possible DNA 8mer oligonucle- 

nucleotides. (3'-CArCGTAGAA-5' (SEQ ID N0:1)). otides (4®. or 65,536 possible combinations) using only 32 

FIG. 4 shows the effect of substituting adenine with chemical synthetic steps. In general. VLSIPS"™ procedures 

2,6-diaminopurine (D) in immobilized poly-d A probe provide a method of producing 4" different oligonucleotide 

arrays. (AAAAANAAAAA (SEQ ID N0:2)). 55 probes on an array using only 4n synthetic steps. 

no. 5 shows the effects of substituting 5-propynyl-2'- in brief, the light-directed combinatorial synthesis of 

deoxyuridine and 2-amino-2' deoxyadenosine in AT arrays oligonucleotide arrays on a glass surface proceeds using 
on hybridization to a target nucleic acid. (ArATAArATA automated phosphoramidite chemistry and chip masking 

(SEQ ID N0:3) and CGCGCCGCGC (SEQ ID N0:4)). techniques. In one specific implementation, a glass surface 

FIG 6 shows the effects of dl and 7KJeaza-dG substitu- 60 is derivatized with a sUane reagent containing a funcUonal 

Uons in oUgonucleotide arrays. (3'-ArGTr(GlG2G3G4G5) group, e.g.. a hydroxyl or amine group blocked by a pho- 

rnr.GT-5' fSEO ID NO-5)) lolabile protecting group. Photolysis through a photolilhog- 

^ ' aphic mask is used selectively to expose functional groups 

DETAILED DESCRIPTION which are then ready to react with incoming 

Methods of synthesizing desired single stranded oUgo- 65 5*-photoprotected nucleoside phosphorami^te^^^ 

nucleotide and oUgonucleotide analogue sequences are TTie phosphoramidites react oiJy with ^^f^^^f ^^^^^^^^^^^ 
^ow^ to those of liU in the art. In particular, methods of Qluminated (and thus exposed by removal of the photolabUe 
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blocking group). Hus. the phosphoramidites only add to sequence Examples i"'"*^' ^-me^^^^^^^ 

those are^ selectively exposed from the preceding step. S-propynyl-. 5;(i?"dazol-2-y )-and 5-(thuzol-2 yl) 

locations on the array . detemiined by the pat em of ^^'^^s^gma chemical company (Saint Louis, Mo.), R&D 
illumination durmg synthesis and the order of addiUon of (Minneapolis, Minn.). Pharmacia LKB Biotechnol- 

coupling reagents. (piscaUway, N.J.), CLONTECH Uboratories, Inc. 

In the event that an oligonucleotide analogue with a ^^^^ Calif.), Chem Genes Corp., Aldrich Chemical 

polyamide backbone is used in the VLSIPS™ procedure, it lO Company (Milwaukee, Wis.), Glen Research, Inc., GIBCO 
is generally inappropriate to use phosphoramidile chemistry Technologies, Inc. (Gaithersberg, Md.), Fluka 

to perform the synthetic steps, since the monomers do not Chemica-Biochemika Analytika (Fluka Chemie AG, Buchs, 

attach to one another via a phosphate linkage. Instead, Switzerland), Invitrogen, San Diego, Calif., and Applied 

peptide synthetic method are substituted. See, e.g., Pirrung Biosyslems (Foster City. Calif.), as well as many other 

et al. U.S. Pat. No. 5,143,854. commercial sources known lo one of skill. Methods of 

Peptide nucleic acids are commercially available from, attaching bases to sugar moieties to form nucleosides are 

e.g., Biosearch, Inc. (Bedford, Mass.) which comprise a known. See, e.g., Lukevics and Zablocka (1991), Arwc/eo5i'iie 

polyamide backbone and the bases found in naturally occur- Synthesis: Organosilicon Methods EUis Horwood Limited 

ring nucleosides. Peptide nucleic acids are capable of bind- Chichester, West Sussex, England and the references therem. 

ing to nucleic acids with high specificity, and are considered Methods of phosphorylating nucleosides to form 

"oligonucleotide analogues" for purposes of this disclosure. nucleotides, and of incorporating nucleotides into oUgo- 

Note that peptide nucleic acids optionally comprise bases nucleotides are also known. See, e.g., Agrawal (ed) (1993) 

other than those which are naturally occurring. Protocols for Oligonucleotides and Analogues, Synthesis 

Hybridization of Nucleotide Analogues ^5 Properties, Methods in ^.^^^^.^^^^^ 

, f .HhPfwP..nRNA<;nrDNAs Humana Press. Towota, NX, and the references therem. Sec 

arf ti?efa?r? fr Jhr o'Ifer ff also. Crooke and Lebleu. and Sanghvi and Cook, and the 

RNA RNA>RNA DNA>DNA:DNA, in solution. Long references cited therein, both supra. 
?lefi^vet«er duplex stabiKty wiih a target, but poorer Groupsare also linked ™ P^^^ ^^^^^^^^^^ 

mismatch discrimination than shorter probes (mismatch 30 side sugar nng or on the purme or pynmidme rings which 

diSSatioTX to the measure hybridization signal may stabilize the duplex by «lf '°f 

r^T^n a perfect match ptobe and a single base the negatively charged phosphate backbone or th^u^ 

m^atch probe Shorter probes (e.g., 8-mers) discriminate hydrogen bonding mteractions m the major and minor 

m^ chrverywSbut'theoverallduplexsU^ groves. Forexample. adenosme and ^^^^'l^^^^^^^ 

rnZer to optimize mismatch discrimination and duplex 35 are optionally substituted at the posuion w«h an mida- 

stabmtv the present invention provides a variety of nude- zolyl propyl group, increasmg duplex stabili^. UmveRal 

3SoL«1^^^^rated into polymers and attached in base analogues such as 3-mtropyrrole and 5-nitiomdole are 

otide analogues incorporaicQ miu pu y ootionally included in oligonucleotide probes to improve 

an array to a solid substrate XSbility through bL stacking interactions. 

Altering the thermal stability (TJ of the duplex foraied ^ J length of oUgonucleotide probes is also an 

have 2 hydrogen bonds per base-pair, ^h?"^ G-C ^/'^'y °^ "^^^y^ ^ ^cx stability for short 

duplexes have 3 hydrogen bonds pcr base pair n hetero- °^^°^"^j,3j^^,„^„^«jifi„,ion^of the sugar moiety in 

geneous oUgonucleoUde arrays m ^^'^'ch J« « 'sj'^^n^^^^ EuTSs provide useful stabilization, and these can 
uniform distribution of bases, it can be difficult to opt^ce ^ ,2^5, of probes for complementary 

hybridization«,nditionsforallprobes^^^^^^^^^ 50 be tg ^^en^ Z^^^^^^^^ ^'-O-metV. 2V0- 

insomeembodmients..tisdesu^aWe todes^^^^^^^ J andT-0-aUylK,ligoribonucleotides have higher 

duplexesand^ortoincreasethestabUit^^^^^^^ fflg affinities for compkmenUry RNA sequences than 

whUemamUming the sequence specificity of bybndiMUoa ^/"e^S counterparts. Probes comprised of 

This is accomplished, e.g., by replaong one or more of the ""T^^^^^^^^ also form more 

TcUic acids to enhance or decrease overall duplex slabUity interactions. For example. substiniUng P^P''°«' «f ^ 3" 
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duplex stabilization. The phosphate diester backbone has sugar-phosphate backbone has been replaced with a polya- 

bccn replaced with a variety of other sUbiUzing, non-natural mide structure. .h.rties 

linkages which have been studied as potential anlisense Thermal cquihbnum studies, kineUc on-rate studies, 

therapeutic agents. See, e.g., Crooke and Ubleu (eds) and sequence specificity analysis is optionaUy performed tor 

(1993) Antisense Research Applications CRC Press; and, 5 target oUgonucleotide and probe or probe analogue. lUe 

Sanghvi and Cook (eds) (1994) Carbohydrate modifications ^^^^ obtained shows the behavior of the analogues upon 

in Antisense Research ACS Symp. Ser. #580 ACS, Wash- ^^^^^^ formation . with target oUgooucleotides. Altered 

ington DC. Very stable hybrids arc formed between nucleic ^^^^^^ stability conferred by using oligonucleotide analogue 

acids and probes comprised of peptide nucleic acids, in probes are ascertained by following, e.g., fluorescence sigiial 

which the entire sugar-phosphate backbone has been intensity of oligonucleotide analogue arrays hybridized with 

replaced with a poiyamide structure. ^ ^^^^^ oligonucleotide over time. The data allow optimi- 

Another important factor which sometimes afifects the use ^^j^qq specific hybridization conditions at, e.g., room 

of oligonucleotide probe arrays is the namre of the target temperature (for simplified diagnostic applications), 

nucleic acid. Oligodeoxynucleotide probes can hybridize to Another way of verifying altered duplex stability is by 

DNA and RNA targets with different afl&nity and ^ecificity. ^5 f^jiQ^j^g the signal intensity generated upon hybridization 

For example, probe sequences containing long "runs" of ^.^^ Previous experiments using DNA targets and 

consecutive deoxyadenosine residues form less stable ^^^^^ ^^^^ shown that signal intensity increases with 

hybrids with complementary RNA sequences than with the ^.^^^ ^^^^ stable duplexes generate higher 

complementary DNA sequences. Substitution of dA m the ^ intensities faster than less stable duplexes. The signals 

probe with either 2,6-diaminopurine deoxyriboside, or 20 ^^^^j^ ^ plateau or "saturate" after a certain amount of time 

2'-alkoxy- or 2'-fluoro-dA enhances hybridization with RNA to aU of the binding sites becoming occupied. These data 

targets. allow for optimization of hybridization, and determination 

Internal strucmre within nucleic acid probes or the targets equilibration conditions at a specified temperature, 
also influences hybridization efficiency. For example. Graphs of signal intensity and base mismatch positions 
GC-rich sequences, and sequences containing "runs" of 25 plotted and the ratios of perfect match versus mis- 
consecutive G residues fi-equenUy self-associate to form jn^tches calculated. This calculation shows the sequence 
higher-order structures, and this can inhibit their bmding to gpe^ific properties of nucleotide analogues as probes. Per- 
complementary sequences. See, Zimmermann et al. (1975) ^^^^ match/mismatch ratios greater than 4 are often desirable 
J. Mol Biol 92: 181; Kim (1991) Nature 351: 331; Sen and ^ oligonucleotide diagnostic assay because, for a diploid 
Gilbert (1988) Nature 335: 364; and Sunqmst and Klug 30 genome, ratios of 2 have to be distinguished (e.g.. in the case 
(1989) Nature 342: 825. Tliese structures are selectively ^ heterozygous trait or sequence), 
destabilized by the substitution of one or more guanine ^ Comprise Nucleotide Ana- 
residues with one or more of the following purmes or purme ^ arg^^ ^""^'^'^ 

analogs: 7-deazaguanine. 8-aza-7-deazaguanine, ^ ... , nucleotides and nucleotide analogues are incor- 

2.aminopurine IH-purine, and hypoxanthine, m order to 35 into ^UA or RNA 

enhance hybndization. . , . , ,1,^ iareet nucleic acids for hybridization analysis to oligonucle- 

Modified nucleic acids and nucleic acid analogs can a^ J fe °raL THe incorporation of nucleotide analogues in 

be used to improve the th ar^et optimizes the hybridization of the target in terms 

Forexample.certain processes andconditions ^^^^^^^^^ of sequence specificity and/or the overall affinity of binding 

for either the fabrication or subsequent use of the array^^^ 40 °J '^^.^^^^^^^^ /^d oUgonucleotide analogue probe 

may not be compatible with standard oligonucleotide ^ ^^^^ ^ nucleotide analogues in either the oligo- 

chemistry, and alternate chemistry can be employed to ^^^^^ t^e target nucleic acid, or both, improves 

overcome these problems. For example, exposure to acjdic ^ ^ hybridation interactions. Examples of 

conditions will cause depunnaUon of purme ajicleotid^, op^ ^ ^^^^ substimted for natu- 

ultimately resulting in chain cleavage and overall degrada- 45 ^^^^^ aucleotMes include 7-deazaguanosine, 2,6- 

tionoftheprobearray.Inthiscase.adenmeaiidguanmeare .^^ nucleotides. 5.propynyl and other 

replaced with 7.deazaadenine arid 7-de a za guanine ^ ^itTd pyrimidine nucleotides. 2'.fluro and 

respectively, in order to stabiUze the oligonuc eotide probes J.^oS -2'-deL and the like, 

tow^s acidic conditions which are used durmg the manu- 2 ^f^^^-^^^^^ ^,,^,,,ed into nucleic 

facture or use of the arrays. a - adds usine the synthetic methods described supra, or using 

Base, phosphate and sugar m^ifications are used in '^^^^ j^, .^cleotide analogues are 

combination to make highly modified oligonucleotide ana- ^^^^^^^^^ j.to target nucleic acids using in 

logues which take advantage of the properues of each of S alStL method! such as PCR, LCR, 

various modifications. For example, ohgonucleotides which ""'^'^^^^ vitro transcription (e.g.. nick 

havehigher binding affinitiesforcomplementary^s^^^^^^ 55 JP; f^^^^^^^ transcription) and the like, 

than their umnodified counterparts (e.g.. 2 -O-methyl-. 2 -O- ^ ^ , ^ nucleotide analogues are optionaUy incor- 
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nucleotides substimte for natural pyrimidines to enhance carboxyl. Preferred surface attaching or derivit^^ 

target hybridization to certain purine rich probes. 2'-fluro include aminoalkylsilanes and bydroxyaUcylsilanes. In pa - 

and 2--methoxy-2'-deoxynucleo' des substimte for natural ticularly preferred embodunents. the surface a>^«=b'ng^r. 
nucleotide to^e„ha„ce\arget hybridization to similarly 

subsututed probe sequences. , triethoxysilylpropyl)-4.hydroxybutylamide, 

Synthesis of 5'-photoprotected 2'-0 alkyl nbonucleotide aminopropyltrietboxysilane- or hydroxypropyltrielhoxysi- 

analogues lane. 

The light-directed synthesis of conaplex arrays of nucle- -j^g oligoribonucieotides generated by synthesis using 

Glide analogues on a glass surface is achieved by derivatiz- ordinary ribonucleotides are usually base labile due to the 

ing cyanoethyl phosphoramidite nucleotides and nucleotide presence of the 2'-hydroxyl group. 2'-0- 

analogues (c g.. nucleoside analogues of uridine, thymidine, methyloligoribonucleotides (2'-OMeORNs), analogues of 

cytidine, adenosine and guanosine, with phosphates) with, RNA where the 2'-hydroxyl group is methylated are DNAse 

for example the photolabile MeNPoc group in the and RNAse resistant, making them less base labJe. Sproat, 

5'-hydroxyl position instead of the usual dimethoxylrityl B-S.andI^ond,A^I.inO/.go^^^^^ 

group. See, appUcation SN PCT/US94/12305. A Practical Approach edited by E ^c^t^^'^^w York^^ 

^ \ J ^, ^ „ 1 1 A ^ „ rr. Press at Oxford University Press, 1991, pp. 49-S6, mcor- 

Spec fic base-protected 2'-0 '^Ig'' °f "i^'J^^^ herein by referenci for all purposes, have reported 

merciaUy available, from, e.g.. Chein Genes Corp. (MA). ^ sy„,hesis of mixed sequences of 2'-0-Methoxy- 

Tle photolabile MeNPoc group >s added to the 5--hydroxy J,yj£j« ^^^^.^ (2--0-MeORNs) using din>ethoxytrityl 

position followed by phosphitylaUon to y^e d cyanoethyl ^| ^^^^yjig ehemistry. Tliese 2'-0-MeORNs display 

phosphoramidite monomers. Commer^^^^^ available P^^^P ^ complementary nucleic acids 

nucleosides are opuonally ^^^'^'^l^'^^ '^y^'^-^^^^ fhan their unmodified Counterparts, 

to create nucleoside analogues which are used to generate ^^^^^ embodiments of the invention provide mechanical 

ohgonucleoude analogue. ^ • means to generate oUgonucleoUde analogues. These tech- 

Modifications to the above procedures are used m some ^ are discussed in co-pending appUcation Ser. No. 

embodiments to avoid sigmficant addition of MenPoc to the n^p^f^r^i filed Nov. 22. 1991, which is incorporated 

3'-hydroxyl position. For instance, in one einbodiment. a ^^^j^ ' „fe„„,e in its entirety for all purposes. 

2'-0-methyl ribonucleotide analogue ^ reacted with DMT- oligonucleotide analogue reagents are directed 

a {''i(P-"«h»yP''-y')f ^ over the surfac! of a substrate such that a predefined array 

pyndme to generate a 2 -O-'n^'^yl-^ "O/D'^T nbonucle- 30 „y 1^^;^^ analogues is created. For instance, a 

otide analogue. This aUows for the addition of TBDMS o channels, grooves, or spots are formed on or 

the 3'-0 of the ribonucleoside analogue by reaction with ^ ^^^^^^^ selectively flowed 

TBDMS-Triflate (t-butyld.methylsilyltrinuoromethane- deposited in the channels, grooves, or spots, 

sulfonate) in the Presence of '"^'hylamme in THF ^^^^ ^ ^^^^^^^ oligonucleotides and/or 

(tetrahydrofuran) to yield a 2'-0-methyl-3 -O-TBDMS-5 -O- 35 ^ J^j^^jtjdg analogues at selected locations on the sub- 

DMT ribonucleotide base analogue. This analogue is treated ouguu^^^" b 

with TCAA (trichloroacetic acid) to cleave off the DMT ^ ra e. „v,;,„.„„ 

group, leaving a reactive hydroxyl group at the 5' position. DetecUon of Hybrid^aUon 

Me Woe is then added to the oxygen of the 5' hydroxyl In one embodiment, hybndization is detected by labehng 

rrouo^iTfi MenPoc-Cl in the p^ence of pyridine. Tlie « a target with, e.g., fluorescem or ot^er known visuahzation 

KgLfis .hen cleaved >^th P (e.g.. NaF) to yield agents and incubaUng t e '^'g^-J ^ ^^^f,^^^^^^^ 

a ribonucleotide base analogue with a MeNPoc group "P°° '^P^f'' 

attached to the 5' oxygen on the nucleotide analogue. If with a ptobe m the array (or tnplex foma ion m emboi- 

appropriate. this analogue is phosphitylated to yield a phos- ments where the array compri^s unimole^^^^^^ double- 

JSdite for oligonucleotide inalogue synthesis. Other « stranded probes) the fluorescein !^bel ^ «c'ted by. e.g^ an 

nucleosides or nucleoside analogues are protected by similar argon laser and detected by viewmg the array, e.g.. through 

procedures. ^ ^"■^"'S confocal microscope, 

synthesis of OligonudeoUde Analogue Anays on Chips ^^^^^l^J^^^^^o^,, ^gbly reliant on 

Other than the ^^^.^'^^^l^Z'^^^^^^^f^' so comp"x pr3:r«Ld require Lbstantiall'nual effort, 
the nucleoside wuphngchem^try u^^^^ 50 P J^^, dNA sequencing technology is a laborious 
nology for synthesmng °l«?°"fl*°"''",''°^?58onu^^^^^ ^ electrophoretic size separation- of 
ottde analogues on chips ^ similar to that used for ohgo- P ^ag^e„ts; An alternative approach involves a 
nucleotide synthesis, mohgonucleot.de IS typ^^^^^^^ Si^tLstra^gy carried out by attaching target DMA to 
to the substrate via the 3--hydroxyl group of the ohgonucle d^na^ gy ^ oligonucle- 
otide and a functional group on the subsu^ate which results 55 J^^^^ ^J^^ ^ ^ ^^^^^ ^^^^^^ 
in the formation of an ether, ester, carbamate or phospnate 

ester linkage. Nucleotide or oligonucleotide analogues are ^^""J. j oligonucleotide probe array syn- 

bearing trichlorosilyl or trialkoxysilyl groups. The surface ^fl'^'f 

attaching groups have a site for attachment of the oligo- 65 oUdes to the matnces produced. 

nucleotide analogue PorUon.Forexample, groups which are Oligonucleotide ana^ope arrays are |«ed,e.g^.to^^^^ 

suiuble for atuchment include amines, hydroxyl. thiol, and sequence specific hybridization of nucleic acids, or protem- 
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nucleic acid interactions. Oligonucleotide analogue arrays 
are used to define the thermodynamic and kinetic rules 
governing the formation and stability of oligonucleotide and 
oligonucleotide analogue complexes. 

Oligonucleotide analogue Probe Arrays and Libraries 
The use of oligonucleotide analogues in probe arrays 
provides several benefits as compared to standard oligo- 
nucleotide arrays. For instance, as discussed supra, certain 
oligonucleotide analogues have enhanced hybridization 
characteristics to complementary nucleic acids as compared 
with oligonucleotides made of naturally occurring nucle- 
otides. One primary benefit of enhanced hybridization char- 
acteristics is that oligonucleotide analogue probes are 
optionally shorter than corresponding probes which do not 
include nucleotide analogues. 

Standard oligonucleotide probe arrays typically require 
fairly long probes (about 15-25 nucleotides) to achieve 
strong binding to target nucleic acids. The use of such long 
probes is disadvantageous for two reasons. First, the longer 
the probe, the more synthetic steps must be performed to 
make the probe and any probe array comprising the probe. 
This increases the cost of making the probes and arrays. 
Furthermore, as each synthetic step results in less than 100% 
coupling for every nucleotide, the quality of the probes 
degrades as they become longer. Secondly, short probes 
provide better mis-match discrimination for hybridization to 
a target nucleic acid. This is because a single base mismatch 
for a short probe-target hybridization is less destabilizing 
than a single mismatch for a long probe-target hybridization. 
Thus, it is harder to distinguish a single probe-target mis- 
match when the probe is a 20-mer than when the probe is an 
8-mer. Accordingly, the use of short oligonucleotide ana- 
logue probes reduces costs and increases mismatch discrimi- 
nation in probe arrays. 

The enhanced hybridization characteristics of oligonucle- 
otide analogues also allows for the creation of oligonucle- 
otide analogue probe arrays where the probes in the arrays 
have substantial secondary structure. For instance, the oli- 
gonucleotide analogue probes are optionally configured to 
be fully or partially double stranded on the array. The probes 
are optionally complexed with complementary nucleic 
acids, or are optionally unimolecular oligonucleotides with 
self-complementary regions. Libraries of diverse double- 
stranded oligonucleotide analogue probes are used, for 
example, in screening studies to determine binding affinity 
of nucleic acid binding proteins, drugs, or oligonucleotides 
(e.g., to examine triple helix formation). Specific oligonucle- 
otide analogues are known to be conducive to the formation 
of unusual secondary structure. See, Durland (1995) Bio- 
conjugate Chem. 6: 278-282. General strategies for using 
unirnolecular double-stranded oligonucleotides as probes 
and for library generation is described in application Ser. No 
08/327,687, and similar strategies are applicable to oligo- 
nucleotide analogue probes. 

In general, a solid support, which optionally has an 
attached spacer molecule is attached to the distal end of the 
oligonucleotide analogue probe. The probe is attached as a 
single unit, or synthesized on the support or spacer in a 
monomer by monomer approach using the VLSIPS™ or 
mechanical partitioning methods described supra. Where the 
oligonucleotide analogue arrays are fully double-stranded, 
oligonucleotides (or oligonucleotide analogues) comple- 
mentary to the probes on the array are hybridized to the 
array. 

In some embodiments, molecules other than 
oligonucleotides, such as proteins, dyes, co-factors, linkers 
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and the like are incorporated into the oligonucleotide ana- 
logue probe, or attached to the distal end of the oligomer, 
e.g., as a spacing molecule, or as a probe or probe target. 
Flexible linkers are optionally used to separate compleraen- 

5 tary portions of the oligonucleotide analogue. 

The present invention also contemplates the preparation 
of libraries of oligonucleotide analogues having bulges or 
loops in addition to complementary regions. Specific RNA 
bulges are often recognized by proteins (e.g., TAR RNA is 

10 recognized by the TAT protein of HIV). Accordingly, h*brar- 
ies of oligonucleotide analogue bulges or loops are useful in 
a number of diagnostic applications. The bulge or loop can 
be present in the oligonucleotide analogue or linker portions. 

Unimolecular analogue probes can be configured in a 
variety of ways. In one embodiment, the unimolecular 
probes comprise linkers, for example, where the probe is 
arranged according to the formula Y — L^ — ^X^ — ^L — X , in 
which Y represents a solid support, and X^ represent a 
pair of complementary oligonucleotides or oligonucleotide 

^° analogues, L* represents a bond or a spacer, and 1} repre- 
sents a liiiing group having sufficient length such that X 
and X^ form a double-stranded oligonucleotide. The general 
synthetic and conformational strategy used in generating the 
double-stranded unimolecular probes is similar to that 

^ described in co-pending application Ser. No. 08/327,687, 
except that any of the elements of the probe (L , X , L and 
X^) comprises a nucleotide or an oligonucleotide analogue. 
For instance, in one embodiment X^ is an oligonucleotide 
analogue. 

^° The oligonucleotide analogue probes are optionally 
arranged to present a variety of moieties. For example, 
structural components are optionally presented from the 
middle of a conformationally restricted oligonucleotide ana- 
logue probe. In these embodiments, the analogue probes 
generally have the structure— X^—Z—X^ wherein X^^ and 
X^^ are complementary oligonucleotide analogues and Z is 
a structural element presented away from the surface of the 
probe array. Z can include an agonist or antagonist for a cell 
membrane receptor, a toxin, venom, viral epitope, hormone, 

^ peptide, enzyme, cofactor, drug, protein, antibody or the 
like. 

General tiling strategies for detection of a Polymorphism 
in a target oligonucleotide 

45 In diagnostic applications, oligonucleotide analogue 
arrays (e.g., arrays on chips, slides or beads) are used to 
determine whether there are any differences between a 
reference sequence and a target oligonucleotide, e.g., 
whether an individual has a mutation or polymorphism in a 

50 known gene. As discussed supra, the oligonucleotide target 
is optionally a nucleic acid such as a PCR amplicon .which 
comprises one or more nucleotide analogues. In one 
embodiment, arrays are designed to conUin probes exhib- 
iting complementarity to one or more selected reference 

55 sequence whose sequence is known. The arrays are used to 
read a Urgel sequence comprising either the reference 
sequence itself or variants of that sequence. Any polynucle- 
otide of known sequence is selected as a reference sequence. 
Reference sequences of interest include sequences known to 

60 include mutations or polymorphisms associated with phe- 
notypic changes having clinical significance in human 
patients. For example, the CFTR gene and P53 gene in 
humans have been identified as the location of several 
mutations resulting in cystic fibrosis or cancer respectively. 

65 Other reference sequences of interest include those that 
serve to identify pathogenic microorganisms and/or are the 
site of mutations by which such microorganisms acquire 
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drug resistance (e.g.. the HIV reverse transcriptase gene for throughout the length of the probe. However, probes ha^ang 
HIV resistance). Other reference sequences of interest asegmentorsegmentsof perfect complementarity that is/are 
include regions where polymorphic variations arc known to flanked by leading or trailing sequences lacking cooiplc- 
occur (e.g., the D-loop region of mitochondrial DNA). mentarity to the reference sequence can also be used. Within 

These reference sequences also have utility for, e.g., 5 a segment of complementarity, each probe in the first probe 
forensic, cladistic, or epidemiological studies. set has at least one interrogation position that corresponds to 

Other reference sequences of interest include those from a nucleotide in the reference sequence. The interrogation 
the genome of pathogenic viruses (e.g., hepatitis (A, B, or position is aligned with the corresponding nucleotide in the 
q. herpes virus (e.g., VZV. HSV-1, HAV-6, HSV-II, CMV, reference sequence when the probe and reference sequence 

and Epstein Barr virus), adenovirus, influenza virus, jo are aligned to maximize complementarity between the two. 
flaviviruses, echovirus, rhinovirus, coxsackie virus, if a probe has more than one interrogation position, each 
cornovirus, respiratory syncytial virus, mumps virus, corresponds with a respective nucleotide in the reference 
rotavirus, measles virus, rubella virus, parvovirus, vaccinia sequence. The identity of an interrogation position and 
virus, HTLV virus, dengue virus, papillomavirus, mollus- corresponding nucleotide in a particular probe in the first 

cum virus, poliovinis, rabies virus, JC virus and arboviral 15 probe set cannot be determined simply by inspection of the 
encephalitis virus. Other reference sequences of interest are probe in the first set. An interrogation position and corre- 
from genomes or episomes of pathogenic bacteria, parlicu- sponding nucleotide is defined by the comparative structures 
larly regions that confer drug resistance or allow phylogenic ©f probes in the first probe set and corresponding probes 
characterization of the host (e.g., 16S rRNAor correspond- from additional probe sets. 

ing DNA). For example, such bacteria include chlamydia, 20 for each probe in the first set, there are, for purposes of 
rickettsial bacteria, mycobacteria, staphylococci, treptocci, present illustration, multiple corresponding probes from 

pneumonococci, meningococci and conococci, klcbsiclla, additional probe sets. For instance, there arc optionally 
proteus, serratia, pseudomonas, legionella, diphtheria, probes corresponding to each nucleotide of interest in the 
salmonella, bacilli, cholera, tetanus, botulism, anthrax, reference sequence. Each of the corresponding probes has an 

plague, leptospirosis, and Lymes disease bacteria. Other 25 interrogation position aligned with that nucleotide of inter- 
reference sequences of interest include those in which Usually, the probes from the additional probe sets are 
mutations result in the following autosomal recessive dis- identical to the corresponding probe from the first probe set 
orders: sickle cell anemia, P-lhalassemia, phenylketonuria, ^jj^ exception. The exception is that at the interrogation 
galactosemia, Wilson's disease, hemochromatosis, severe position, which occurs in the same position in each of the 

combined immunodeficiency, alpha- 1 -antitrypsin 30 corresponding probes from the additional probe sets. This 
deficiency, albinism, alkaptonuria, lysosomal storage dis- position is occupied by a different nucleotide in the corre- 
eases and Ehlers-Danlos syndrome. Other reference spending probe sets. Other tiling strategies are also 
sequences.of interest include those in which mutations result employed, depending on the information to be obtained, 
in X-linked recessive disorders: hemophiUa, glucose-6- ^^^^ oligonucleotide analogues which are 

phosphate dehydrogenase, agammaglobulmienia. diabetes 35 hybridizing with a target nucleic sequence by 

insipidus. Lesch-Nyhan syndrome, inuscular dystrophy. complementary base-pairing. Complementary base pairing 
Wiskott-Aldrich syndrome. Fabry s disease and fragile -^^^^^^^ sequence-specific base pairing, which comprises. 
X-syndrome. Other reference sequences of mterestmcludes ^ Watson-Crick base pairing or other forms of base 
those in which mutations result in the following autosomal ^\ ^^^^ ^ Hoogsteen base pairing. The probes are 

dominant disorders: faraUial hypercholesterolemia, polycys- 40 ^^^^^^^ ^ any appropriate linkage to a support. 3' attach- 
tic kidney disease, Huntington's disease, hereditary ^jent is more usual as this orientation is compatible with the 
spherocytosis. Marfan's syndrome, von Willebrand s ^^f^^^ chemistry used in solid phase synthesis of oUgo- 
disease, neurofibromatosis, tuberous sclerosis, hereditary nucleotides and oligonucleotide analogues (with the excep- 
hemorrhagic telangiectasia, familial colonic polyposis, ^ analogues which do not have a phosphate 

Ehlers-Danlos syndrome, myotonic dystrophy, muscular 45 backbone. sJch as peptide nucleic acids), 
dystrophy, osteogenesis imperfecta, acute intermittent 

porphyria, and von Hippel-Lindau disease. EXAMPLES 

Although an array of oligonucleotide aiialoguc probes is ^^^^ jj,^^^. 

usuaUy laid down m rows and columns for simphfied data J J J.^^^.^^ ^ ^^j^ „j p^^^. 

ptocesstng, such a physical ^"'^S^^J^lfP^^''^ °° ^ '° eterscai be changed or modified to yield essenUally similar 
solid substrate is not essential. Provided that the spatial ggyij^ 

location of each probe in an array is known, the data from resu . . 1 u„K„M;.™i;or, 

the probes is coUected and preceded to yield the sequence . One approach ° "J^^^^^,^^^^^ 
of a target irrespective of the physical arrangement of the >s to increase the therma^ stabflity (TJ o/ the duplex formed 

Ibis S gTchip iDproc'liingto^ 55 between the Urget and the ptobe '^•"8 ^•'g"""^^^^^^^ 
S signak bom the respective probes is assembled into analogues that are knowa to mcre^ J» ^ «P?° JJ'^^^^ 
fny con«ptual array desirJd for subsequent data reduction. tion to DNA. Enhanced hyb"^!^^™^ ^^^^^^ 
whatever Sie physical arrangement of probes on the sub- analogues is described in the example below, mduding 
wnaievcr inc puysitd. g f enhanced hybridization in ohgonudeoude arrays, 

strate. 

In one embodiment, a basic tiling strategy provides an 60 Example 1 

array of immobilized probes for analysis of a Urget oligo- , . t 

nucleotide showing a high degree of sequence similarity to Solution ohgonucleotide melung T„ 
one or more selected reference oligonucleotide (e.g.. detec- The T„ of 2'-0-methyl ohgonucleotide analogues w^ 
tion of a point mutaUon in a target sequence). R>r instance. compared to the T„ for the corresponding DNA and RNA 

a first probe set comprises a pluraUty of probes exhibiting 6S sequences in solution. In addiuon. the T„ of 2-O-melhyl 
perfect complementarity with a selected reference oligo- oligonudeodderDNA, 2'-0-methyl oligonucleotide:RNA 
nucleotide. The perfect complementarity usually exists and RNA:DNA duplexes in solution was also determined. 
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The T„ was determined by varying the sample temperature 
and monitoring the absorbance of the sample solution at 260 
nm. The oligonucleotide samples were dissolved in a O.IM 
NaCl solution with an oligonucleotide concentration of 2 
/iM. Table 1 summarizes the results of the experiment. The 
. results show that the hybridization of DNA in solution has 
approximately the same T„ as the hybridization of DNA 
with a 2'-0-meihyl-substituted oligonucleotide analogue. 
The results also show that the T„ for the 2'-0-methyl 



rate of increase in intensity was then plotted for each probe 
position. The rate of increase in intensity was similar for 
both Urgets in the 8-mer probe arrays, but the 12-mer probes 
hybridized more rapidly to the DNA target oligonucleotide. 

Plots of intensity versus probe position were generated for 
the RNA, DNA and 2-0-melhyl oligonucleotides to ascer- 
tain mismatch discrimination. The 8-mer probes di^layed 
similar mismatch discrimination against all targets. The 
12-mer probes displayed the highest mismatch discrimina- 



ine results aiSO snow uai lac iui mc ^-v/-ijjuiujfi- iz,-iiicf piuwa uiijjia_ywj uu^ ui^K^* — 

substituted oUgonucleotide duplex is higher than that for the lO tion for the DNAiargets, followed by the 2'-0-methyl target, 
^«™c«^ DM AO' n.TTipthvl-Rnhsti tilted olieonucle- with the RNA target showine the ooorest mismatch discrimi- 



corresponding RNA:2'-0-methyl-substituted oligonucle- 
otide duplex, which is higher than the T„ for the corre- 
sponding DNA:PNA or RNA:DNA duplex. 

TABLE 1 

SolutioQ OligoQudeotide Melting Experiments 
(+) - Target Sequence 
(5'-CrGAACGGTAGCArCTrGAC-3)(SEQ ID NO: 6)* 

(-) - Complcmenlary Sequence 
r5'CrirAAGAroCTACCGTTCAG- -^^fSFO TP NO: 7^* 
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Type of Oligonucleotide, 
Target Sequence (+) 



Type of Oligonucleotide, 
Complementafy Sequence (+) 



DNA(+) 

DNA(+) 

2'OMc(+) 

2'OMc(+) 

RNA(+) 

RNA(+) 



DNA(-) 

2'0xMe(-) 

DNA(-) 

2'OMe(-) 

DNA(-) 

2X)Me(-) 



61.6 
58.6 
61.6 
78.0 
58.2 
73.6 



•T refers to thymine for the DNA oligonucleotides, or uracil for the RNA 
oligonucleotides. 
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Example 2 

Array hybridization experiments with DNA chips and 
oligonucleotide analogue targets 

A variable length DNA probe array on a chip was 
designed to discriminate single base mismatches in the 3 
corresponding sequences 
5'-CTGAACGGTAGCATCTTGAC-3' (SEQ ID N0:6) 
(DNA target), 5'-CUGAACGGUAGCAUCUUGAC-3' 
(SEQ ID N0:8) (RNA target) and 
5'-CUGAACGGUAGCAUCUUGAC-3' (SEQ ID N0:9) 
(2'-0-methyl oligonucleotide target), and generated by the 
VLSIPS™ procedure. The Chip was designed with adjacent 
12-mers and 8-mers which overlapped with the 3 target 
sequences as shown in Table 2. 
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with the RNA target showing the poorest mismatch discrimi- 
nation. 

Thermal equilibrium experiments were performed by 
hybridizing each of the targets to the chip for 90 minutes at 
5® C. temperature intervals. The chip was hybridized with 
the target in 5x SSPE at a target concentration of 10 nM. 
Intensity measurements were taken at the end of the 90 
minute hybridization at each temperature point as described 
above. All of the targets displayed similar stability, with 
minimal hybridization to the 8-mer probes at 30° C. In 
addition, all of the targets showed similar stability in hybrid- 
izing to the 12-mer probes. Thus, the 2'-0-methyl oligo- 
nucleotide target had similar hybridization characteristics to 
DNA and RNA targets when hybridized against DNA 
probes. 

Example 3 

- 2'-0-methyl-substituted oligonucleotide chips 

DMT-protected DNA and 2'-0-methyl phosphoramidites 
were used to synthesize 8-mer probe arrays on a glass slide 
using the VLSIPS™ method. The resulting chip was hybrid- 
ized to DNA and RNA targets in separate experiments. The 
target sequence, the sequences of the probes on the chip and 
the general physical layout of the chip is described in Table 
3. 

The chip was hybridized to the RNA and DNA targets in 
successive experiments. The hybridization conditions used 
were 10 nM target, in 5x SSPE. The chip and solution were 
heated from 20° C. to 50" C, with a fluorescence measure- 
ment taken at 5 degree intervals as described in SN PCT/ 
US94/12305. The chip and solution were maintained at each 
temperature for 90 minutes prior to fluorescence measure- 
ments. The results of the experiment showed that DNA 
probes were equal or superior to 2'-0-methyl oligonucle- 
otide analogue probes for hybridization to a DNA target, but 
that the 2'-0-methyl analogue oligonucleotide probes 



TABLE 2 



Array hybridization Experiments 



Target 1 (DNA) 

S-mer probe (complement) 

12-mer probe (complement) 

Thrgct 2 (RNA) 

8-mcr probe (complement) 

12-mcr probe (complement) 

Tbrgct 3 (2'-0-Mc oligo) 

8-mer probe (complement) 

12-mcr probe (complement) 



5'-CrGAACGGTAGCArCITGAC-3' (SEQ ID NO: 6) 



5'-CUGAACGGUAGCAUCUUGAC-3* (SEQ ID NO: 8) 



5'-CUGAACOGUAGCAUCUUGAC-3' (SEQ ID NO: 9) 



Target oligos were synthesized using standard techniques. 
The DNA and 2'-0-methyl oligonucleotide analogue target 
oligonucleotides were hybridized to the chip at a concen- 
tration of 10 nM in 5x SSPE at 20** C. in sequential 65 
experiments. Intensity measurements were taken at each 
probe position in the 8-mer and 12-mer arrays over time. The 



showed dramatically better hybridization to the RNA target 
than the DNA probes. In addition, the 2*-0-methyl analogue 
oligonucleotide probes showed superior mismatch discrimi- 
nation of the RNA target compared to the DNA probes. The 
difference in fluorescence intensity between the matched and 
mismatched analogue probes was greater than the difference 
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between the matched and mismatched DNA probes, dra- 
matically increasing the signal-to-noisc ratio. FIG. 1 dis- 
plays the results graphically (HGS. lA and IB). (M) and (?) 
indicate mismatched and perfecUy matched probes, respec- 
tively. (FIGS. IC and ID) iUustrates the fluorescence inten- 
sity versus location on an example chip for the various 
probes at 20*^ C. using DNA and RNA targets, respectively. 

TABLE 3 

2'-0-mcthyl Oligonucleotide Analogues on a Chip. 



Target Sequence (DNA): 
Target Sequence (RNA): 



5'-CrGAACGGTAGCArCTrGAC-3' 
(SEQ lb NO: 6) 

5*-CUGAACGOUAGCAUCUUGAC-3' 
(SEQ ID NO: 8) 

5'.CITGCCAr (SEQ ID NO: 10) 



5*-CUUGCCAU (SEQ ID NO: 11) 



Matching DNA oligonucleotide 
prot)c {DNA (M)} 
Matching 2"-0-mcthyI 
oligonucleotide analogue probe 

DrSftu^Leotide probe with 5'-CITGCTAr (SEQ ID NO: 12) 
1 base mismatch {DNA (P)} 

r-O-mcthyl oligonucleotide 5'-CUUGCUAU (SEQ ID NO: 13) 
analogue probe with 1 base 

mismatch {2'OMe (M)} 

SCHEMAnC REPRESENTAnON OF 2'-0- METHYUDNA CHIP 

Matching 2'-0-methyl oligonucleotide analogue probe 
2'-0-methyl oligonucleotide analogue probe with 1 base mismatch 
DNA oligonucleotide probe with 1 base mismatch 
Matching DNA oligonucleotide probe 



Example 4 
Synthesis of oligonucleotide analogues 
The reagent MeNPoc-Cl group reacts non-selectively 
with both the 5* and 3' hydroxyls on 2'-0-methyl nucleoside 
analogues. Thus, to generate high yields of 5'-0-MeNPoc- 
2'-0-methylribonucleosid6 analogues for use m oligonucle- 
otide analogue synthesis, the following protection- 
deprotection scheme was utilized. 

The protective group DMT was added to the 5'-0 position 
of the 2'-0-methylribonucleoside analogue in the presence 
of pyridine. The resulting 5'-0-DMT protected analogue 
was reacted with TBDMS-Triflate in THF, resulting m the 
addition of the TBDMS group to the 3'-0 of the analogue 
The 5'-DMT group was then removed with TCAA to yield 
a free OH group at the 5' position of the 2'-0-methyl 
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oligodeoxynucleotide arrays were constructed using 
VI^IPS^M methodology and 5'-0-MeNPOC-protected 
deoxynucleoside phosphoramidites. Each array was com- 
prised of the following set of probes based on the sequence 
(3')-CArCGTAGAA-(5') (SEQ ID N0:1): 
L-(HEG)-(3>CArN,GTAGAA-{5') (SEQ ID N0:14) 
2.-(HEG)-(3>CArCN2TAGAA<5') (SEQ ID N0:15) 
3 -(HEG)-(30-CArCGN3AGAA<5') (SEQ ID N0:16) 
4;<HEG)-(3')*CArCGTN^GAA-(5') (SEQ ID N0:17) 
where HEG=hexaethyleneglycol linker, and N is either 
A,G,C or T, so that probes are obtained which contain single 
mismatches introduced at each of four central locations in 
the sequence. The first probe aaay was constructed with all 
natural bases. In the second array, 2-amino-2'- 
deoxyadenosine (D) was used in place of adenosine (A). 
Both arrays were hybridized with a 5'-fiuorescein-labeled 
oligodeoxynucleotide target, (5*)-Fl-d 
(CrGAACGGTAGCATCITGAq<3') (SEQ ID N0:18), 
which contained a sequence (in bold) complementary to the 
base probe sequence. The hybridization conditions were: 10 
nM target in 5x SSPE buffer at 22° C. with agitation. After 
30 minutes, the chip was mounted on the flowcell of a 
scanning laser confocal fluorescence microscope, rinsed 
briefly with 5x SSPE buffer at 22' C, and then a surface 
fluorescence image was obtained. 

The relative efficiency of hybridization of the target to the 
complementary and single-base mismatched probes was 
determined by comparing the average bound surface fluo- 
rescence intensity in those regions of the of the array 
containing the individual probe sequences. The results (FIG. 
3) show that a 2-amino-2'-deoxyadenosine (D) substinition 
in a heterogeneous probe sequence is a relatively neutral 
one, with little effect on either the signal intensity or the 
specificity of DNA-DNA hybridization, under conditions 
where the target is in excess and the probes are saturated. 

Example 6 

Hybridization to a dA-homopolymer oligpdeoxynucle- 
otide probe substituted with 2-amino-2*-deoxyadenosine (D) 
The following experiment was performed to compare the 
hybridization of 2'-deoxyadenosine containing homopoly- 
mer arrays with 2-amino-2*-deoxyadenosine homopolymer 
arrays. The experiment was performed on two 11-mer oli- 
godeoxynucleotide probe containing arrays. Two 11-mer 



a free OH group at the 5' position of the 2'-0-methyl Qjig^^goxynucleotide probe sequences were synthesized on 
ribonucleoside analogue, followed by the addition ot 45 ^ & ^ 5'-0-MeNP0C-protected nucleoside phos- 



MeNPoc-a in the presence of pyridine, to yield 5'-0- 
MeNPoc-3'-0-TBDMS-2*-0-methyl ribonucleoside ana- 
logue. The TBDMS group was then removed by reaction 
with NaF, and the 3'-0H group was phosphitylated uang 
standard techniques. 

Two other potential strategies did not resuh in high 
specific yields of 5'-0-MeNPoc-2'-0-methylribonucleoside. 
In the first, a less reactive MeNPoc derivative was synthe- 
sized by reacting MeNPoc-Cl with N-hydroxy succmiide to 
yield MeNPoc-NHS. This less reactive photocleavable 
group (MeNPoc-NHS) was found to react exclusively with 
the 3' hydroxyl on the 2'-0-methyhibonucleoside analogue. 
In the second strategy, an organotin protection scheme was 
used, Dibutyltin oxide was reacted with the 2'-0- 
methylribonucleoside analogue followed by reaction with 
McNPoc. Both 5'-0-MeNPoc and 3'-0-MeNPoc 2'-0- 
mclhylribonucleoside analogues were obtained. 

Example 5 

Hybridization to mixed-sequence oligodeoxynucleotide 
probes substituted with 2-amino-2»-deoxyadenosine (D) 

To test the effect of a 2-amino-2'-deoxyadenosinc (D) 
substitution in a heterogeneous probe sequence, two 4x4 
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a chip using 5'-0-MeNP0C-protected nucleoside phos- 
phoramidites and standard VLSIPS''" methodology. 

The sequence of the first probe was: (HEG)-(3')-d 
(AAAAANAAAAA)-(5') (SEQ ID N0:19); where HEG- 
hexaethyleneglycol linker, and N is either A,G,C or T. The 
second probe was the same, except that dA was replaced by 
2-amino-2'-deoxyadenosine (D). The chip was hybridized 
with a 5*-fluorescein-labeled oligodeoxynucleotide target, 
(5>Fl-d(TmTGTmT)-(3') (SEQ ID N0:20). which 
contained a sequence complementary to the probe sequences 
where N=C. Hybridization conditions were 10 nM target m 
5x SSPE buffer at 22** C. with agitation. After 15 minutes, 
the chip was mounted on the flowcell of a scanning laser 
confocal fluorescence microscope, rinsed briefly with 5x 
SSPE buffer at 22** C. (low stringency), and a surface 
fluorescence image was obtained. Hybridization to the chip 
was continued for another 5 hours, and a surface fluores- 
cence image was acquired again. Finally, the chip was 
washed briefly with 0.5x SSPE (high-stringency), then with 
5x SSPE, and re-scanned. 

The relative efficiency of hybridization of the target to the 
complementary and single-base mismatched probes was 
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determined by comparing the average bound surface fluo- A 16x64 oUgonucleotide array was co^ructcd using 

rescence intensity in those regions of the of the array VLSIPS™ methodology, with 5'-0-MeNPOC-protcctcd 

containing the individual probe sequences. The results (FIG. nucleoside phosphoramidites, including the analogs ddG, 

4) indicate that substituting 2*-deoxyadenosine with ^nd dl. The array was comprised of the set of probes 
2-amino-2'-deoxyadenosine in a d(A)„ homopolymcr probe ^ represented by the following sequence: -(linker)-(3')-d(A T 

sequence results in a significant enhancement in specific G TT Gj G2 G3 G4 G5 C G 0 0 T)-(50; (SEQ ID NO:28) 

hybridization to a complementary oligodeoxynucleotide ^^^^^ underiined bases are fixed, and the five interna! 

sequence. deoxyguanosines (G 1.5) are substituted with G, ddG, dl, and 

Example 7 T in all possible (1024 toul) combinations. A complemen- 

Hybridization to alternating A-T oligodeoxynucleotide 10 ^ry f f^^^^^^^^^^^ 
probes substituted with 5-propynyl-2'-deoxyuridine (P) and S'-end: (5>F1^(C A A T A C A A ^ C C C C G C C C A 
2^mko-2'^eoxyadenosine (D) T C CH3) (SEQ ID NO:29). was hybridized to the aaay^ 

Commercially available S'-DMT-protected The hybridization conditions were: 5 nM target m 6xSSPE 
2*Kleoxynucleoside/nucleoside.analog phosphoramidites buffer at 22° C. with shaking. After 30 mmutes, the chip was 
(Glen Research) were used to synthesize two decanucleotide 15 mounted on the flowcell of an Affymetnx scannmg l^r 
probe sequences on separate areas on a chip using a modified confocal fluorescence microscope, rinsed once with 0.25 x 
VLSIPS''" procedure. In this procedure, a glass substrate is SSPE buffer at 22** C, and then a surface fluorescence image 
initially modified with a terminal-MeNPOC-protected hexa- was acquired. 

elhyleneglycol linker. The substrate was exposed to light «efl5ciency" of target hybridization to each probe in 

through a mask to remove the protecting group from the 20 the array is proportional to the bound surface fluorescence 
Unker in a checkerboard pattern. The first probe sequence intensity in the region of the chip where the probe was 
was then synthesized in the exposed region using DMT- synthesized. The relative values for a subset of probes (those 
phosphoramidites with add-deprotection ^Y^^' ^"^^^^ containing dG-ddG and dG-dl substitutions only) are 
sequence was finally capped with (MeOJ^PNiPr^/tetrazole ^ Substitution of guanosine with 

followed by oxidation. A 25 T^Ieazaguanosine within the internal run of five G's results 

a different (previously unexposed) region of the ctup was /^^^ ^ enhancement in the fluorescence signal 

then performed, and the second probe sequence was syn- ^ * aigmuvau , . . n Z»h 

Zsii^d by the same procedure. Tlie sequence of the first intensity which measures hybndization. Deoxyinosme sub- 
"cootrol" probe wasf .(HEG)-(3')-CGCGCCGCGC.(5') stitutions also enhance hybndizaUon to the probe, but to a 
fSEO ID NO-21)- and the sequence of the second probe was lesser extent. In this example, the best overall enhancement 
one of the following- ^° is reaUzed when the dG "run" is -40-60% subsumted with 

l.-(HEG)-(30-d(AT/aAArArA)-(5') (SEQ ID NO:22) 7-deaza-dG. with the substitutions distributed evenly 

2 -(HEG)-(3')-d(APAPAAPAPA)H;5') (SEQ ID NO:23) throughout the run (i.e.. alternating dG/deaza-dG). 

3. -(HEG)-(3')-d(DTDTDDTDTD)-(5') (SEQ ID NO:24) 

4. -(HEG)-(3')-d(DPDPDDPDPD)-(5') (SEQ ID NO:25) Example 9 

where "EG-hexaethyleneglycol li^^^ 35 ^^^^.^ 5'-MeNPOC.2'.deoxyinosine.3'-(N,N. 

t7^^os^^^^^^^ diisoUyl-2.yanoethyl)phosphor^^^^^^^^^ 
chip was then hybridized in a solution of a fluorescein- 2*-deoxyinosine (5.0 g, 20 mmole) was dissolved m 50 ml 
labeled oligodeoxynucleotide target, (5')-Fluorescein-d of dry DMF, and 100 ml dry pyridme was added and 
(TATArTArAr)-(HEG)^(GCGCGGCGCGH3') (SEQ ID ^ evaporated three times to dry the solution. Another 50ml 
NO:26 and SEQ ID NO:27), which is complementary to pyridine was added, the solution was cooled to -20** C. 
both the A/T and G/C probes. The hybridization conditions ^nder argon, and 13.8 g (50 mmole) of MeNPOC-chloride 
were: 10 nM target in 5x SSPE buffer at 22"* C. with genUe 20 ml dry DCM was then added dropwisc with stirring 
shaking. After 3 hours, the chip was mounted on the flowcell ^^^^ 59 minutes. After 60 minutes, the cold bath was 
of a scanning laser confocal fluorescence microscope, rinsed removed, and the solution was allowed to stir overnight at 
briefly with 5x SSPE buffer at 22° C, and then a surface temperature. Pyridine and DCM were removed by 

fluorescence image was obtained. Hybridization to the chip evaporation, 500 ml of ethyl acetate was added, and the 
was continued overnight (total hybndization time-20hr), ^^^^^^^^ washed twice with water and then with brine 
and a surface fluorescence unage was acquired again. aqueous washes were combined and 

The relative effidency of hybridization of the U^^^^ back-extracted twice with ethyl acetate, and then all of the 

AA-andsubstitutedA/Tprobeswasde^^^^^^^ 50 b ^^^^ ^^^J^ ^.^^ 

ing the average surface fluorescence intensity bound to those \I a xhi. nmHnrt wfl<; recmtallized 

parts of the chip containing the ATI or substituted probe to . rP°^^^!?/°^*^^^"^Tn %Cf inf 5^0 
fhe fluorescence intensity bound to the G/C control probe from DCM to obtam 5.0 g (50% y^^^^ ., ^ -O- 
sequence. Tlieresults(nG.5)showthat5.propynyl.dUand MeNPOC-T-deoxjanosme as a y^^^^ soid (99% punty, 
2-amino^ substitution in an AH'-rich probe significanUy 55 accordmg to ^H-NMR and HPLC analysis), 
enhances the affinity of an oligonucleotide analogue for The MeNPOC-nucleoside (2.5 g, 5.1 mmole) was sus- 
complcmentary Utget sequences. The unsubstiiuted A/T- pcndcd in 60 ml of dry CH3CN and phosphitylaied with 
probe bound only 20% as much target as the aU-G/C-probe 2-cyanoethyl-N,N,N*,N*-tetraisopropylphosphorodiamidite 
of the same length, while the D- & P-substituted AfX probe 55 g/i.gg ml; 5.5 mmole) and 0.47 g (2.7 mmole) of 
bound nearly as much (90%) as the G/C-probc. Moreover, ^ diisopropylammonium tetrazolide, according to the pub- 
Ihe kinetics of hybridization are such that, at early times, the Ushed procedure of Barone, et al. {Nucleic Acids Res. (1984) 
amount of target bound to the substituted A/T probes 4051-61). The crude phosphoramidite was purified by 

exceeds that which is bound to the all-G/C probe. chromatography on silica gel (90:8:2 DCM-MeOH- 

Example 8 Et^N), co-evaporated twice with anhydrous acetonitrile and 

Hybridization to oUgodeoxynucleotide probes substituted 65 dried under vacuum for -24 ^ou'S to obtain^^^^ 
wiS 7laza.2'.deoxyiuanosL (ddG) and 2'^eoxyinosine the pu^ product ^ 
^jlj by ^H/'^P-NMR and HPLC). 



A 
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Example 10 

Synthesis of 5'-MeNPOC-7-deaza-2*-deoxy(N2- 
isobutyryl)-guaaosine-3'-(N.N-diisopropyl-2-cyanoetbyl) 
phosphoramidite. 5 

The protected nucleoside 7-deaza-2'-deoxy(N2- 
isobulyryl)guanosine .(1.0 g, 3 ramble; Chemgenes Corp., 
Waltham, Mass.) was dried by co-evaporating three times 
with 5 ml anhydrous pyridine and dissolved in 5 ml of dry 
pyridine-DCM (75:25 by vol.). The solution was cooled to 
-45* C. (dry ice/CHjCN) under argon, and a solution of 0.9 
g (3.3 mmole) MeNPOC-Cl in 2 ml dry DCM was then 
added dropwise with stirring. After 30 minutes, the cold bath 
was removed, and the solution allowed to stir overnight at 
room temperature. The solvents were evaporated, and the 
crude material was purified by flash chromatography on 
silica gel {2.5%^5% MeOH in DCM) to yield 1.5 g (88% 
yield) 5*-MeNPOC-7-deaza-2'-deoxy(N2-isobutyryl) 
guanosine as a yellow foam. The product was 98% pure 
according to ^H-NMR and HPLC analysis. 

The MeNPOC-nucleoside (1.25 g, 2.2 mmole) wasphos- 
phitylated according to the published procedure of Barone, 
et al. {Nucleic Acids Res, (1984) 12, 4051-^1). The crude 
product was purified by flash chromatography on silica gel ^^^^^^ _ 
(60:35:5 hexane-ethyl acetate-EtjN), co-evaporated twice ^ p^^.s. 74: 5463) 
with anhydrous acetonitrile and dried under vacuum for -24 
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VLSIPS oligonucleotide probe arrays in which all or a 
subset of all guanosine residues are substitutes with 7-deaza- 
2'-deoxyguanosine and/or 2'-deoxyinosine are highly desir- 
able. This is because guanine-rich regions of nucleic acids 
associate to form multi-stranded structures. For example, 
short tracts of G residues in RNA and DNA commonly 
associate to form tetrameric structures (Zimmermann et al. 
(1975) J. MoL Biol 92: 181; Kim, J. (1991) Nature 351: 
331; Sen el al. (1988) Nature 335: 364; and Sunquist el al. 
(1989) Nature 342: 825). The problem this poses to chip 
hybridization-based assays is that such structures may com- 
pete or interfere with normal hybridization between comple- 
mentary nucleic acid sequences. However, by substituUng 
the 7-deaza-G analog into G-rich nucleic acid sequences, 
particulariy at one or more positions within a run of G 
residues, the tendency for such probes to form higher-order 
strucuires is suppressed, while maintaining essentially the 
same affinity and sequence specificity in double-stranded 
structures. This has been exploited in order to reduce band 
compression in sequencing gels (Mizusawa, et al (1986) 
NA.R. 14: 1319) to improve target hybridization to G-rich 
probe sequences in VLSIPS arrays. Similar results are 
achieved using inosine (see also, Sanger et al. (1977) 
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hours to obtain 1.3 g (75%) of the pure product as a yeUow 
solid (98% purity as determined by *H/^^P-NMR and 
HPLC). 

Example 11 

Synthesis of 5'-MeNPOC-2,6-bis(phenoxyacetyl) -2,6- 
diaminopurine.2'-deoxyriboside-3'-(N,N-diisopropyl-2- 

cyanoethyl)phosphoramidite. 

The protected nucleoside 2,6-bis(phenoxyacetyl) -2,6- 
diaminopurine-2'-deoxyriboside (8 mmole, 4.2 g) was dried 
by coevaporating twice from anhydrous pyridine, dissolved 
in 2:1 pyridine/DCM (17.6 ml) and then cooled to -40** C. 
MeNPOC-chloride (8 mmole, 2.18 g) was dissolved in 
DCM (6.6 mis) and added to reaction mixture dropwise. The 40 
reaction was allowed to stir overnight with slow warming to 
room temperature. After the overnight stirring, another 2 
mmole (0.6 g) in DCM (1.6 ml) was added to the reaction 
at -40" C. and stirred for an additional 6 hours or until no 
unreacled nucleoside was present. The reaction mixture was 45 
evaporated to dryness, and the residue was dissolved in ethyl 
acetate and washed with water twice, followed by a wash 
with saturated sodium chloride. The organic layer was dried 
with MgS04, and evaporated to a yellow solid which was 
purified by flash chromatography in DCM employiiig a 50 
methanol gradient to elute the desired product in 51% yield. 

The 5'-MeNPOC-nucleoside (4.5 mmole, 3.5 g) was 
phosphitylated according to the published procedure of 
Barone, et al. (Nucleic Acids Res. (1984) 12, 4051-61). The 
crude product was purified by flash chromatography on 55 
siUca gel (99:0.5:05 DCM-MeOH-E^N). The pooled frac- 
tions were evaporated to an oil. rcdissolved in a minuniim 
amount of DCM, precipitated by the addition of 800 ml ice 
cold hexane, filtered, and then dried under vacuum for -24 
hours. 

Overall yield was 56%, at greater than 96% purity by 
HPLC and W^P-NMR- 

Example 12 

5'.0-MeNPOC-protectcd pho^horamidites for incorpo- 
rating 7-deaza-2'deoxyguanosine and 2'-deoxyinosine into 
VLSSIPS™ Oligonucleotide Arrays 



For facile incorporation of 7-deaza-2'-deoxyguanosine 
and 2'-deoxyinosine into oligonucleotide arrays using 
VLSIPS™ methods, a nucleoside phosphoramidite compris- 
ing the analogue base which has a 5'-0'-MeNP0C- 
protecting group is coostmcted. This building block was 
prepared from commercially available nucleosides accord- 
ing to Scheme I, These amidites pass the usual tests for 
coupling efficiency and photolysis rate. 



SCHEME 1 



McNPOC-Cl/pyridme^ 

CH2a2 " 



NH-ibu 



60 



65 




NH-ibu 



MeNPOC-O* 



NH-ibu 

(iPr2N)2POCE/lPr2NH/rCT^ 
■ CHjQj " 
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Although the foregoing invention has been described in 
some detail by way of illustration and example for purposes 
of clarity of understanding, modifications can be made 
thereto without departing torn the spirit or scope of the 
appended claims. 



All publications and patent applications cited in this 
10 application are herein incorporated by reference for all 
purposes as if each individual publication or patent appli- 
cation were specifically and individually indicated to be 
incorporated by reference. 



SEQUENCE LISTIKG 



(1) GENERAL INFORMATION: 

(iii) NUMBER OP SEQUENCES: 29 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS I single 

(D) TOPOLOGY: linear 

(ii) MpLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

10 

AAGATGCTAC 



(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 
AAAAANAAAA A 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 

10 

ATATAATATA 



(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH! 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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-continued 



(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CGCGCCGCGC 



10 



(2) INFORMATION FOR SEQ ID N0:5: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 15 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY! linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbaBe 

(B) LOCATION: 6.. 10 

(D) OTHER INFORMATION: /mocLbase- OTHER 
/note- - guanosine (G), 
2' ,3'-dideoxyguanine (ddG), 
2 • -deoxyinosine (dl) or thymine (T)" 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 

TGGGCNNNNN TTGTA 



15 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1,.20 

(D) OTHER INFORMATION: /note- "Target DNA sequence" 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0;6; 
CTGAACGGTA GCATCTTGAC 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1,.20 

(D) OTHER INFORMATION: /note- -Complementary DNA sequence" 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 
GTCAAGATGC TACCGTTCAG 



(2) INFORMATION FOR SEQ ID NO: 8! 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: RNA 

(ix) FEATURE: 

- (A) NAME/KEY: - 

(B) LOCATION: 1..20 

(D) OTHER INFORMATION: /note- -Target PNA sequence" 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 
CUGAACGGUA GCAUCUUGAC 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 ba&e pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - -2 * -0-methyl oligonucleotide" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedJjase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod-base- cm 

(ix) FEATURE: 

(A) NAME/KEY; modifiedJ)a6e 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /mod_base- um 

(ix) FEATURE: 

. (A) NAME/KEY: modifiedjaase 
: (B) LOCATION: 3 

(D) OTHER INFORMATION: /modjbase- gm 

(ix) FEATURE: 

(A) NAME/KEY: modifiedJbase 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /modjbase" OTHER 
/note- -2 ' -O-methyladenosine* 

(ix) FEATURE: 

(A) NAME/KEY: modifiedJjase 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /nwd-base- OTHER 
/note- "2 '-O-methyladenosino" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedJbase 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /modjjase- cm 

(ix) FEATURE: 

(A) NAME/KEY: modified-base _ 

(B) LOCATION: 7 - 

(D) OTHER INFORMATION: /mod_ba8e- gm 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: B 

(D) OTHER INFORMATION: /modjbase- gm 

(ix) FEATURE: 

(A) NAME/KEY: modified-base 

(B) LOCATION: 9 

(D) OTHER INFORMATION: /mod_base- um 

(ix) FEATURE; 

(A) NAME/ KEY: modifiedJ>a8e 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mo4_baBe- OTHER 
/n ot e- ' 2 ' - O-met hy 1 adenc s in e* 
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(ix) FEATURE: 

(A) NAME/KEY: modified-base 

(B) LOCATION: 11 

(D) OTHER INFORMATION: /mod-base- gm 

(ix) FEATURE; 

(A) NAME /KEY: modified_ba8e 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /modjjaee- cm 

(ix) FEATURE: 

(A) NAME/KEY I mcxiifiedJ>aBe 

(B) LOCATION: 13 

(D) OTHER INFORMATION: /modjsaee- OTHER 
/note- "2 *-0-methyladenoBine* 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 14 

(D) OTHER INFORMATION: /mod-base- um 

(ix) FEATURE: 

(A) NAME/KEY: modified-base 

(B) LOCATION: 15 

(D) OTHER INFORMATION: /modjaase- cm 

(ix) FEATURE: 

(A) NAME/KEY: mcxlified^aBe 

(B) LOCATION: 16 

(D) OTHER INFORMATION: /mod-base- um 

(ix) FEATURE: 

(A) NAME/KEY: modified^aae 

(B) LOCATION: 17 

(D) OTHER INFORMATION: /modjjase- um 

(ix) FEATURE: 

(A) NAME/KEY: modl5ed_base 

(B) LOCATION: 18 

(D) OTHER INFORMATION: /modjaase- gm 

(ix) FEATURE: 

(A) NAME/KEY I modifiedjaase 

(B) LOCATION: 19 

(D) OTHER INFORMATION: /mod.base- OTHER 
/note- -2 ' -O-methyladenoeine* 

(ix) FEATURE: 

(A) NAME/KEY: modified-base 

(B) LOCATION: 20 

(D) OTHER INFORMATION: /mod_base- cm 

(ix) FEATURE: 

(A) NAKE/KEYi - 

(B) LOCATION: 1..20 

(D) OTHER INFORMATION: /note- -Target 2'-0-methyl 
oligonucleotide Bequence" 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 

NNNNNNNNNN NNNNNNNNNN 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..8 

(D) OTHER INFORMATION: /note- "Matching DNA oligonucleotide 
probe* 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



(2) INFORMATION FOR SEQ ID HO; 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) KOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - -2 ' -O-methyl oligonucleotide" 

(ix) FEATURE: 

(A) NAME/ KEY I modified_ba8e 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod-base- cm 

(ix) FEATURE: 

(A) NAME/KEY: modified-base 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /mod_ba8e- um 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjjase 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mod-base- um 

(ix) FEATURE: 

(A) NAKE/KEY: modified_baae 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /mod_baae" gm 

( ix ) FEATURE : 

(A) NAME/ KEY: modified-base 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /modjjase- cm 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjaase 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /mod^base- cm 

(ix) FEATURE: 

(A) NAME/KEY: modifiedJ>ase 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /mod^ase- OTHER 
/note- -2*-0-methyladeno8ine'' 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjjase 

(B) LOCATION: 8 

(D) OTHER INFORMATIONS /modjjase- um 

(ix) FEATURE: 

(A) NAME/KEY: r 

(B) LOCATION: 1..8 

(D) OTHER INFORMATION: /note- -Matching 2 '-O-methyl 
oligonucleotide analogue probe" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

NNNNNNHN 



(2) INFORMATION FOR SEQ ID N0:12t 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
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(A) NAKE/KEY; - 
<B) LOCATION: 1..8 

(D) OTHER INFORMATION: /note- 'DNA oligonucleotide probe 
with 1 base miomatch" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 
CTTGCTAT 



(2) INFORKATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - -2 * -O-methyl oligonucleotide" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjsase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod-base= cm 

(ix) FEATURE: 

(A) NAME/KEY: modified-base 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /mod_base- um 

(ix) FEATURE: 

(A) NAME/KEY: modified_ba8o 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mod-base- um 

(ix) FEATURE: . 

(A) NAME/ KEY: modifietLbase 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /modjjase- gm 

(ix) FEATURE: 

(A) NAME/ KEY: modified^base 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /mod^base- cm 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /modJ>ase= um 

(ix) FEATURE; 

(A) NAME/ KEY: modi£ed_base 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /mod-base- OTHER 
/note- *'2'-0-methyladenoBine'' 

(ix) FEATURE: 

(A) NAME/KEY I iaodified-base 

(B) LOCATION: 8 

(D) OTHER INFORMATION: /mod^base- um 

(ix) FEATURE; 

(A) NAME/KEY: - 

(B) LOCATION: 1..8 

(D) OTHER INFORMATION: /note- -2'-0-methyl oligonuclec 
analogue probe with 1 base mismatch" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

NNNNNNNN 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE; nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLCXJYi linear 

(ii) MOLECULE TYPE: DHA 

(ix) FEATURE: 

(A) KAME/KEY: modifie<Lba»e 

(B) LOCATION: 10 

(D) OTHER IKFORMATION; /mocLbase- OTHER 
/note- U - cytosine covalently 
modified at the 3' phosphate group with 
a hexaethyleneglycol (HEG) linker* 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

AAGATGNTAN 



(2) INFORMATION FOR SEQ ID NO: 15: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: modified^base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - cytosine covalently modified 
at the 3* phosphate group with a 
hexaethyleneglycol (HEG) linker" 

. (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

AAGATNCTAN 



(2) INFORMATION FOR SEQ ID K0il6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE; 

(A) NAME/KEY: modified_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /modJ^ase- OTHER 

/note- *N - cytosine covalently modified 
at the 3* phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

AAGANGCTAN 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /roodjiaae- OTHER 

/note- m - cytosine covalently modified 
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at the 3 • phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17; 

AAGNTGCTAN 



(2) INFORMATION FOR SEQ ID NO: 18: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY I linear 

(ii) MOLECULE TYPE: DNA 

<ix) FEATURE: 

(A) NAME/KEY: modificcLbase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mo<Lba8e- OTHER 

/note- "N " cytosine covalently m o difi ed 
at the 5' phosphate group with a 
fluorescein molecule" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

NTGAACGGTA GCATCTTGAC 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DHA 

(ix) FEATURE! 

(A) NAME/KEY: modified_base 

(B) LOCATION! 11 

(D) OTHER INFORMATION: /mod^base- OTHER 

/note- "N - adenine covalently modified 
at the 3' phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

AAAAANAAAA N 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE! DNA 

(ix) FEATURE: 

(A) NAME/KEY I modifiecLbaBe 

(B) LOCATION: 1 

(D) OTHER INFORMATION! /nod^base- OTHER 

/note- -N - thymine covalently modified 
at the 5* phosphate group with a 
fluorescein molecule" 



(xi) SEQUENCE DESCRIPTION: SEQ ID HO: 20: 
HTTTTGTTTT T 
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(2) INFORMATION FOR SEQ ID NO 1 21 J 

(i) SEQUENCE CHARACTERISTICS: 
• (A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modifiedJ^ase 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /modJjase- OTHER 

/note- *H - cytosine covalently modified 
at the 3* phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

CGCGCCGCGN 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "2 '-deoxynucleoeide/nucleoside 
analogue decanucleotide probe" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base . 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /modjjase- OTHER 
/note- "N - 2 '-deoxyadenofline" 



(ix) FEATURE: 

(A) NAME/KEY: modifiedjaase 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mcd^base- OTHER 
/note- "N - 2 '-deoxyadenosine* 



(ix) FEATURE: 

(A) NAME/ KEY: modi£e<Lbase 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /mod-base- OTHER 
/note- *N - 2 ' -deoxyadenosine' 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjjase 

(B) LOCATION! 6 

(D) OTHER INFORMATION: /mod-base- OTHER 
/note- 'N • 2 '-deoxyadenosine" 

(ix) FEATURE: 

. (A) NAME/KEY t modifiedJiase . 
(B) LOCATION: 8 

(D) OTHER INFORMATION: /modJaase- OTHER 
/note- "N - 2 '-deoxyadenosine" 



(ix) FEATURE: 

(A) NAME /KEY: modified^base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /modJ)ase- OTHER 

/note- -N - 2 '-deoxyadenosine covalently 
modified at the 3' phosphate group with 
a hexaethyleneglycol (HEG) lin)cer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
NTNTNNTNTN 
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{2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 10 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECtJLE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "2 '-deoxynucleo side/ nucleoside 
analogue decanucleotide probe" 

(ix) FEATURE: 

(A) NAME/KEY t modifiecLbase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /modjaase- OTHER 
/note- '"N - 2 *-deoxyadenosine* 

(ix) FEATURE: 

(A) NAME/KEY: modifiedJbase 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /mod_baBe- OTHER 

/note- "N - 5-propynyl-2 • -deoxyuridine" 

(ix) FEATURE; 

(A) KAME/KEY: niodified_base 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mod^base- OTHER 
/note- "N - 2 '-de oxy adenosine " 

(ix) FEATURE: 

(A) NAME/KEY: modified^ase 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /mod.base- OTHER 

/note- "N » 5-propynyl-2'-deoxyuridine" 

(ix) FEATURE; 

(A) NAME/KEY: modifiedJbase 

(B) LOCATION: 5 

(D) OTHER INFORMATION; /mod^base- OTHER 
/note- "N - 2 ' -deoxyadenosine* 

(ix) FEATURE: 

(A) NAME/KEY I modifiedJbase 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /modjjase- OTHER 
/note- "N - 2 '-deoxyadenosine* 

(ix) FEATURE: 

(A) NAME/KEY; modifiedJbase 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /mod^base- OTHER 

/note- "N - 5-propynyl-2'-deoxyuridine'* 

(ix) FEATURE: 

(A) NAME/KEY: modified_baBe 

(B) LOCATION: 8 

<D) OTHER INFORMATION: /mod_ba6e- OTHER 
/note- "N - 2 '-deoxyadenosine' 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjjase 

(B) LOCATION: 9 

(D) OTHER INFORMATION: /mod-base- OTHER 

/note- -N - 5-propynyl-2*-deoxyuridine*' 

(ix) FEATURE: 

(A) NAME/KEY; modified_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /modJbase- OTHER 

/note- "N - 2* -deoxyadenosine covalently 
modified at the 3* phosphate group with 
a hexaethyleneglycol (HEG) linker* 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



NNNNNNNNNN 
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(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "2 ' -deoxynucleoaide/nucleoside 
analogue decanucleotide probe" 

(ix) FEATURE: 

(A) NAME/KEY: modi£edJ3ase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /modjaase- OTHER 

/note- 'N - 2 -amino-2 *-deo}cy adenosine" 



(ix) FEATURE: 

(A) NAME/KEY: modified-bese 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /moOase- OTHER 

/note- "N - 2-amino-2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /niod_baee- OTHER 

/note- 'N - 2 -amino-2'-deoxy adenosine" 

(ix) FEATURES 

(A) NAME/ KEY I modified_base 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- 'N - 2 -amino-2 '-deoxy adenosine" 

( ix ) FEATURE : . 

(A) NAME/ KEY: inodi£ed_base 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /mod^base- OTHER 

/note- *N - 2 -amino-2 *-deoxy adenosine" 



(ix) FEATURE: 

(A) NAME/ KEY: niodi£ed_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod-base- OTHER 

/note- "N - 2 - amino- 2 '-deoxy adenosine 
covalently modified at the 3' 
phosphate group with a 
hexaethyleneglycol (HEG) lin)cer" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 



(2) INFORMATION FOR SEQ ID NO: 25: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "2 ' -deoxynucleoside/ nucleoside 
analogue decanucleotide probe" 

(ix) FEATURE: 

(A) NAME/KEY: inodifiedJ>ase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod^base- OTHER 

/note- "N - 2 -amino- 2 '-deoxy adenosine" 



(ix) FEATURE: 

(A) NAME/KEY: modi£ed_ba8e 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /modjsase- OTHER 

/note- -N - 5-propynyl-2'-deoxyuridine" 
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(ix) FEATURE: 

(A) NAME/ KEY: mociifie<$_baBe 
<B) LOCATION: 3 

(D) OTHER INFORMATION: /mocLbase- OTHER 

/note* "N - 2 -amino- 2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjjase 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /mocLbase- OTHER 

/note- -N - 5-propynyl-2'-deoxyuridine'' 

(ix) FEATURE: 

(A) NAME/KEY: modified^ase 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /nwdjiase- OTHER 

/note- 'N - 2-amino-2 '-deoxyadenoaine" 

(ix) FEATURE: 

(A) NAME/KEY: modified-base 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /modjaaso- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine* 

(ix) FEATURE: 

(A) NAME/KEY I modified^base 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /mod-base- OTHER 

/note- ''N - 5-propynyl-2'-deoxyuridine" 

(ix) FEATURE: 

(A) NAME/KEY: modified^base 

(B) LOCATION: 8 

(D) OTHER INFORMATION: /mod.base- OTHER 

/note- "N - 2 -airiino-2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_ba8e 

(B) LOCATION: 9 

(D) OTHER INFORMATION: /mocLbase- OTHER 

/note- *N - 5-propynyl-2'-deoxyuridine'' 

(ix) FEATURE: 

(A) NAME/KEY: modified^base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_baae- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine 
covalently modified at the 3' 
phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

NNNNNNNNNN 



(2) INFORMATION FOR SEQ ID NOi26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA 

(ix) FEATURE: 

(A) NAME/KEY t modified^base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /modjjase- OTHER 

/note- "N - thymine covalently modified 
at the 5' hydroxyl group with a 
fluorescein molecule* 

(ix) FEATURE: 

(A) NAME/KEY: modifiedJ>ase 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /modjjase- OTHER 
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/note- "N - thymine c ova lent ly modified 
at the 3' phoophate group with a 
hexaethyleneglycol (HEG) linker ^Aich is 
covalently bound to the 5' phosphate 
. group of the 5' guanine (N in pos. 1) of 

SEQ ID NO:27- 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

NATATTATAN 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS! single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modified^base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mocLbase- OTHER 

/note-* "N - guanine covalently modified 
at the 5' phosphate group with a 
hexaethyleneglycol (HEG) linker which is 
covalently bound to the 3' phosphate 
group of the 3' thymine <N in pos. 10) 
of SEQ ID NO: 26" 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 27: 

NCGCGGCGCG 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjE)ase 

(B) LOCATION: 6.. 10 

(D) OTHER INFORMATION: /modjbase- OTHER 
/note- "N - guanine (G), 
2' ,3'-dideoxyguanine (ddG), 
2 • -deoxyinosine (dl) or thymine (T)" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedj^ase 

(B) LOCATION: 15 

(D) OTHER INFORMATION: /modjbase- OTHER 

/note- *N - cytosine covalently modified 
at the 5* phosphate group with a 
hexaethyleneglycol (HEG) linker* 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

TGGGCNNNNN TTGTN 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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ks(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) KAHE/KEY: modifiedjsase 

(B) LOCATION: 1 

<D) OTHER INFORMATION: /nio<lj3ase- OTHER 

/note- m - cytosine covalently modified 
at the 5 ' phosphate group with a 
fluorescein molecule" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

NAATACAACC CCCGCCCATC C 



What is claimed is: 

1. A composition for analyzing interactions between oli- 
gonucleotide targets and oligonucleotide probes comprising 
an array of a plurality of oligonucleotide analogue probes 
having different sequences, wherein said oligonucleotide 
analogue probes are coupled to a solid substrate at known 
locations and wherein said plurality of oligonucleotide ana- 
logue probes arc selected to bind to complementary oligo- 
nucleotide targets with a similar hybridization stability 2S 
across the array. 

2. The composition of claim 1, wherein at least one of said 
oligonucleotide analogue probes is selected to maintain 
hybridization specificity or mismatch discrimination with 
said complementary oligonucleotide targets, 30 

3. The composition of claim 1, wherein at least one of said 
oligonucleotide analogue probes has increased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 35 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

4. The composition of claim 1, wherein at least one of said 
oligonucleotide analogue probes has decreased the thermal 
stability between said oligonucleotide analogue probe and 40 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

5. The composition of claim 2, wherein at least one of said 45 
oligonucleotide analogue probes has increased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 50 
oligonucleotide analogue probe anneals. 

6. The composition of claim 2, wherein at least one of said 
oligonucleotide analogue probes has decreased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 55 
an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

7. The composition of claims 1-5 or 6, wherein said solid 
substrate is selected from the group consisting of silica, 60 
polymeric materials, glass, beads, chips, and slides. 

8. The composition of claims 1-5 or 6, wherein said 
composition comprises an array of oligonucleotide analogue 
probes 5 to 20 nucleotides in length. 

9. The composition of claims 1-5 or 6, wherein said array 65 
of oligonucleotide analogue probes comprises a nucleoside 
analogue with the formula 




the nucleoside analogue is not a naturally occurring DNA 
or RNA nucleoside; 
is selected from the group consisting of hydrogen, 
methyl, hydroxyl, alkoxy, alkythio, halogen, cyano, 
and azido; 

is selected from the group consisting pf hydrogen, 
methyl, hydroxyl, alkoxy, alkythio, halogen, cyano, 
and azido; 
Y is a heterocyclic moiety; 

and wherein said nucleoside analogue is incorporated into 
the oligonucleotide analogue by attachment to a 3' 
hydroxyl of the nucleoside analogue, to a 5' hydroxyl of 
the nucleoside analogue, or both the 3' nucleoside and 
the 5' hydroxyl of the nucleoside analogue. 

10. The composition of claims 1-5 or 6, wherein said 
array of 

oligonucleotide analogue probes comprises a nucleoside 
analogue with the formula 




wherein: 

the nucleoside analogue is not a naturally occurring DNA 
or RNA nucleoside; 

R^ is selected from the group consisting of hydrogen, 
hydroxyl, methyl, methoxy, ethoxy, propoxy, allyloxy, 
propargyloxy. Fluorine, Chlorine, and Bromine; 

R^ is selected from the group consisting of hydrogen, 
hydroxyl, methyl, methoxy, ethoxy, propoxy, allyloxy, 
propargyloxy. Fluorine, Chlorine, and Bromine; and 
Y is a base selected from the group consisting of 
purines, purine analogues pyrimidines, pyrimidine 
analogues, 3-nitropyrrole and 5-mtroindole; 

and wherein said nucleoside analogue is incorporated into 
the oligonucleotide analogue by attachment to a 3* 
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hydroxyl of the nucleoside analogue, to a 5' hydro xyl of 23. The composition of claims 1-5 or 6, wherein at least 

the nucleoside analogue, or both the 3' nucleoside and one of plurality of said oligonucleotide analogue probes 

the 5* hydroxyl of the nucleoside analogue. forms a first duplex with a target oligonucleotide sequence, 

11. The composition of claims 1-5 or 6, wherein each wherein said oligonucleotide analogue probe has a corre- 
probe of said plurality of oligonucleotide analogue probes 5 sponding oligonucleotide sequence that forms a second 
has at least one oligonucleotide analogue, and wherein at . duplex with said taxget oligonucleotide sequence, wherein 
least one of said oligonucleotide analogues comprises a . said second duplex is rich in A-T or G-C nucleotide pairs, 
peptide nucleic acid, and wherein said oligonucleotide analogue probe has at least 

12. The composition of claims 1-5 or 6, wherein at least one nucleotide analogue in place of an A, T, G, or C 
one of said plurality of oligonucleotide analogue probes said nucleotide of said corresponding oligonucleotide sequence 
array of oligonucleotide analogue probes is resistant to at a position within said oligonucleotide analogue probe 
RNAase A. such that said first duplex has an inacased hybridization 

13. The composition of claims 1-5 or 6, wherein said stability than said second duplex. 

solid substrate is attached to over 1000 different oligonucle- 24. The composition of claim 23, wherein said oligo- 
otidc analogue probes. nucleotide analogue probe contains fewer bases than said 

14. The composition of claims 1-5 or 6, wherein each ^5 corresponding oligonucleotide sequence. 

probe of said plurality of oligonucleotide analogue probes 25. The composition of claims 1-5 or 6, wherein said 

has at least one oligonucleotide analogue, and wherein at oUgonucleotide analogue probe forms a first duplex with a 

least one of said oligonucleotide analogues comprises 2'-0- Urget oligonucleotide sequence, wherein said oligonucle- 

methyl nucleotides. otide analogue probe has a corresponding oligonucleotide 

15. The composition of claims 1-5 or 6, wherein said 20 sequence that forms a second duplex with said target poly- 
array of oligonucleotide analogue probes and said solid nucleotide sequence, and wherein said oligonucleotide ana- 
substrate comprises a plurality of different oligonucleotide logue probe is shorter than said corresponding polynucle- 
analogue probes, each oligonucleotide analogue probes hav- otide sequence, 

ing the formula: 26. A composition for analyzing the interaction between 

25 an oligonucleotide target and an oligonucleotide probe com- 
Y— L*— X'— L^— prising an array of a plurality of oligonucleotide probes 

having different sequences hybridized to complementary 

wherein, oligonucleotide analogue targets, wherein said oligonucle- 

Y is a solid substrate; otide analogue targets bind to complementary oligonucle- 

and are complementary oligonucteotides contain- ^ otide probes with a similar hybridization stability across the 

ing at least one nucleotide analogue; array. 

is a spacer composition of claim 26, wherein at least one of 

is a linking'group having sufficient length such that X^ said oligonucleotide analogue target is fleeted to maintain 

and X' form a double-stranded oligonucleotide. hybndization specificity or mismatch discnmination with 

16. The composition of claim 15, wherein said composi- said complementary ohgonucleoUde probes. 

tion comprises a library of unimolecular double-stranded ?}• conaposition of clami 26. wherem at least one of 

oUgonucleotide analogue probes. oligonucleotide analogue targets has mcreased the 

17. The composition of claims 1-5 or 6, wherein said thermal stability between said ohgonuc eoude analogue 
array of oligonudeotide analogue probes comprises a con- target and said complementan^ oUgonucleotide probe as 
forinationally restricted anay of oUgonucleotide analogue compared to an oligonuc eotide Urgel that ^ the perfect 
probes with the formula: ^ complement to the complementary oligonucleotide probe 

with which said oligonucleotide analogue target anneals. 

_x"_2^x" 29. The composition of claim 26, wherein at least one of 

said oligonucleotide analogue targets has decreased the 

wherein X" and X^^ are complementary oUgonucleotides thermal stabiUty between said oUgonucleotide analogue 

or oUgonucleotide analogues and Z is a presented 45 target and said complementary oUgonucleotide probe as 

moiety. compared to an oligonucleotide target that is the perfect 

18. The composition of claims 1-5 or 6, wherein each complement to the complementary oligonucleotide probe 
probe of said pluraUty of oUgonucleotide analogue probes with which said oligonucleotide analogue target anneals, 
has at least one oligonucleotide analogue, and wherein at 30. The composition of claim 27, wherein at least one of 
least one of said oUgonucleotide analogues comprises a 50. said oligonucleotide analogue targets has increased the 
nucleotide with a base selected fi-om the group of bases thermal stabiUty between said oUgonucleotide analogue 
consisting of 5-propynyluracil, 5-propynylcytosine, target and said complementary oUgonucleotide probe as 
2-aminoadenine, 7-deazaguanine, 2-aminopurine, 8-aza-7- compared to an oligonucleotide target that is the perfect 
deazaguanine, IH-purine, and hypoxanthine. complement to the complementary oligonucleotide probe 

19. The composition of claims 1-5 or 6, wherein said 55 with which said oligonucleotide analogue target anneals. 
pluraUty of oligonucleotide analogue probes are coupled to 31. The composition of claim 27, wherein at least one of 
said soUd substrate by light-directed chemical coupUng. said oligonucleotide analogue targets has decreased the 

20. The composition of claim 19, wherein said soUd thermal stabiUty between said oUgonucleotide analogue 
substrate is dcrivitized with a silane reagent prior to syn- target and said complementary oUgonucleotide probe as 
thesis of said pluraUty of oligonucleotide analogue probes, 60 compared to an oligonucleotide target that is the perfect 

21. The composition of claims 1-5 or 6, wherein said complement to the complementary oligonucleotide probe 
pluraUty of oligonucleotide analogue probes are coupled to with which said oligonucleotide analogue target anneals, 
said solid substrate by flowing oligonucleotide analogue 32. The composition of claims 26-30 or 31, wherein the 
reagents over known locations of the soUd substrate. oUgonucleotide analogue target is a PCR ampUcon. 

22. The composition of claim 21, wherein said soUd 65 33. The composition of claims 26-30 or 31, wherein at 
substrate is derivilized with a silane reagent prior to syn- least one of said pluraUty of oligonucleotide probes com- 
ihesis of said pluraUty of oligonucleotide analogue probes, prise at least one oUgonucleotide analogue. 
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34. The composition of claims 26-30 or 31, wherein at sized on said solid support by causing oligonucleotide 
least one target oUgoauclcotidc analogue acid is an RNA analogue synthetic reagents to flow over liown locations of 
nucleic acid. said solid support. 

35. A method analyzing interactions between an oligo- 45. The method of claims 35-39 or 40, wherein said step 
nucleotide target and an oligonucleotide probe, comprising 5 (a), comprises the steps of: 

the steps of: i). forming a plurality of channels adjacent to the surface 

(a) , synthesizing an oligonucleotide analogue array com- of said substrate; 

prising a plurality of oligonucleotide analogue probes ii). placing selected reagents in said channels to synthe- 

having different sequences, wherein said oligonucle- size oligonucleotide analogue probes at known loca- 

otide analogue probes are coupled to a solid substrate 10 tions; and 

at known locations, said solid substrate having a sur- iii). repeating steps i). and ii). thereby forming an array of 

face* oligonucleotide analogue probes having different 

(b) . exposing said oligonucleotide analogue probe array to sequences at known locations on said substrate. 

a plurality of oligonucleotide targets under hybridiza- 46. The method of claims 35-39 or 40. wherein said solid 

tion conditions such that said plurality of oHgonucle- 15 substrate is selected from the group consistmg of beads, 

otide analogue probes bind to complemeniary oligo- ^ ^7'^ meth^ of claims 35-39 or 40, wherein said solid 

nuclcoude targets with a smiilar hybridization stability ^^^^^^^^ ^ comprised of materials selected from the group 

across the array; and consisting of silica, polymers and glass. 

(c) . determining whether an oligonucleotide analogue 45 j^e method of claims 35-39 or 40, wherein the 
probe of said oligonucleotide analogue probe array 20 oligonucleotide analogue probes of said array are synthe- 
binds to at least one of said target nucleic acids. sized using photoremovable protecting groups. 

36. The method in accordance of claim 35, wherein at 49. The method of claims 35-39 or 40, further comprising 
least one of said oligonucleotide analogue probes is selected selectively incorporating MeNPoc onto the 3* or 5' hydroxyl 
to maintain hybridization specificity or mismatch discrimi- of at least one nucleoside analogue and selectively incorpo- 
nation with said complementary oligonucleotide targets. 25 rating said nucleoside analogue into at least one of said 

37. The method in accordance of claim 35, wherein at oligonucleotide analogue probes. 

least one of said oligonucleotide analogue probes has 50. The method of claims 35-39 or 40, wherein at least 

increased the thermal stability between said oligonucleotide one of said oligonucleotide analogue probes is synthesized 

analogue probe and said complementary oligonucleotide from phosphoramidite nucleoside reagents. 

Urgel as compared to an oligonucleotide probe that is the 30 51. A method of detecting an oligonucleotide target, 

perfect complement to the complementary oligonucleotide comprising enzymatically copying an oligonucleotide target 

target with which said oligonucleotide analogue probe using at least one nucleotide analogue, thereby producing 

anneals. multiple oligonucleotide analogue targets, selecting said 

38. The method in accordance of claim 35, wherein at oligonucleotide analogue targets such that said oligonucle - 
least one of said oligonucleotide analogue probes has 35 otide analogue targets bind to the complementary oligo- 
decreased the thermal stability between said oligonucleotide nucleotide probes coupled to a solid surface at known 
analogue probe and said complementary oligonucleotide locations of an array with a similar hybridization stability 
target as compared to an oligonucleotide probe that is the across the array, hybridizing the oligonucleotide analogue 
perfect complement to the complementary oligonucleotide targets to complementary oligonucleotide probes, and 
target with which said oligonucleotide analogue probe 40 detecting whether at least one of said oligonuclotide ana- 
anneals, logue targets binds to said complementary oligonucleotide 

39. The method in accordance of claim 36, wherein at acid probe. 

least one of said oligonucleotide analogue probes has 52. The method of claim 51, wherein at least one of said 

increased the thermal stability between said oligonucleotide oligonucleotide analogue targets is selected to maintain 

analogue probe and said complementary oligonucleotide 45 hybridization specificity or mismatch discrimination with 

target as compared to an oligonucleotide probe that is the said complementary oligonucleotide probes, 

perfect complement to the complementary oligonucleotide 53. The method of claim 51, wherein at least one of said 

target with which said oligonucleotide analogue probe oligonucleotide analogue targets has increased the thermal 

anneals. stability between said oligonucleotide analogue target and 

40. The method in. accordance of claim 36, wherein at 50 said complementary oligonucleotide probe as compared to 
least one of said oligonucleotide analogue probes has an oligonucleotide target that is the perfect complement to 
decreased the thermal stability between said oligonucleotide the complementary oligonucleotide probe with which said 
analogue probe and said complementary oligonucleotide oligonucleotide analogue target anneals. 

target as compared to an oligonucleotide probe that is the 54. The method of claim 51, wherein at least one of said 

perfect complement to the complementary oligonucleotide 55 oligonucleotide analogue targets has decreased the thermal 

target with which said oligonucleotide analogue probe stability between said oligonucleotide analogue target and 

anneals. said complementary oligonucleotide probe as compared to 

41. The method of claims 35-39 or 40, wherein said an oligonucleotide target that is the perfect complement to 
oligonucleotide target is selected from the group comprising the complementary oligonucleotide probe with which said 
genomic DNA, cDNA, unspliced RNA, mRNA, and rRNA. 60 oligonucleotide analogue target anneals. 

42. The method of claims 35-39 or 40, wherein said target 55. The method of claim 52, wherein at least one of said 
nucleic acid is amplified prior to said hybridization step. oligonucleotide analogue targets has increased the thermal 

43. The method of claims 35-39 or 40, wherein said stability between said oligonucleotide analogue target and 
plurality of oligonucleotide analogue probes is synthesized said complementary oligonucleotide probe as compared to 
on said solid support by light-directed synthesis. 65 an oligonucleotide target that is the perfect complement to 

44. The method of claims 35-39 or 40, wherein said the complementary oligonucleotide probe with which said 
plurality of said oligonucleotide analogue probes is synthc- oligonucleotide analogue target anneals. 
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56. The metbcxl of claim 52, wherein at least one of said 
oligonucleotide analogue targets has decreased the thermal 
stability between said oligonucleotide analogue target and 
said complementary oligonucleotide probe as compared to 
an oligonucleotide target that is the perfect complement to 5 
the complementary oligonucleotide probe with which said . 
oligonucleotide analogue target anneals. 

57. The method of claims 51-55 or 56, wherein the 
oligonucleotide probe array comprises at least one oligo- 
nucleotide analogue probe which is complementary to at :o 
least one of said oligonucleotide analogue targets. 

58. A method of making an array of oligonucleotide 
probes, comprising providing a plurality of oligonucleotide 
analogue probes having at least one oligonucleotide 
analogue, said oligonucleotide analogue probes having dif- 15 
ferent sequences at known locations on an array, selecting 
the oligonucleotide analogue probes to hybridize with 
complementary oligonucleotide target sequences under 
hybridization conditions such that said oligonucleotide ana- 
logue probes bind to complementary oligonucleotide targets 20 
with a similar hybridization stability, across the array. 

59. The method of claim 58, wherein at least one of said 
oligonucleotide analogue probes is selected to maintain 
hybridization specificity or mismatch discrimination with 
said complementary oligonucleotide targets. 25 

60. The method of claim 58, wherein at least one of said 
oligonucleotide analogue probes has increased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 30 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

61. The method of claim 58, wherein at least one of said 
oligonucleotide analogue probes has decreased the thermal 
stability between said oligonucleotide analogue probe and 35 
said complementary oligonucleotide target as compared to 
an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

62. The method of claim 59, wherein at least one of said 40 
oligonucleotide analogue probes has increased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 45 
oligonucleotide analogue probe anneals. 

63. The method of claim 59, wherein at least one of said 
oligonucleotide analogue probes has decreased the thermal 
stability between said oligonucleotide analogue probe and 

. said complementary oligonucleotide target as compared to 50 
an oligonucleotide probe that is the perfect complement to 
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the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

64. The method in accordance with claims 58-62, or 63, 
further comprising incorporating at least one oligonucle- 
otide analogue into at least one of the oligonucleotide 
analogue . probes of the array to reduce or prevent the 
formation of secondary structure in the oligonucleotide of 
the array. 

65. The method in accordance with claims 58-62, or 63, 
further comprising incorporating at least one oligonucle- 
otide analogue into at least one of the oligonucleotide target 
to reduce or prevent the formation of secondary structure in 
the target polynucleotide sequence. 

66. The method in accordance with claims 58-62, or 63, 
further comprising incorporating at least one oligonucle- 
otide analogue into at least one of the oligonucleotide 
analogue probes of the array to create secondary structure in 
the oligonucleotide of the array. 

67. The method in accordance with claims 58-62, or 63, 
further comprising incorporating a base selected from the 
group consisting of 5-propynyluracil, 5-propynylcytosine, 
2-aminoadenine, 7-deazaguanine, 2-aminopurine, 8-aza-7- 
deazaguanine, IH-purine, and hypoxanthine into the oligo- 
nucleotide analogue probes of the array. 

68. The method of claim 67 further comprising selecting 
said at least one oligonucleotide analogue such that the 
oligonucleotide analogue probe is a homopolymer. 

69. The method in accordance with claims 58-62, or 63, 
further comprising selecting said at least one oligonucleotide 
analogue from the group consisting essentially of oligo- 
nucleotide analogues comprising 2*-0-methyl nucleotides 
and oligonucleotides comprising a base selected from the 
group of bases consisting of 5 -propynyluracil, 
5-propynylcytosine, 7-deazaguanine, 2-aminoadenine, 
8-aza-7-deazaguanine, IH-purine, and hypoxanthine. 

70. The method in accordance with claims 58-62 or 63, 
further comprising selecting said at least one oligonucleotide 
analogue such that oligonucleotide analogue probes com- 
prises at least one peptide nucleic acid. 

71. The method in accordance with claims 58-62, or 63, 
further comprising selecting said at least one oligonucleotide 
analogue to increase image brightness when the oligonucle- 
otide target and the oligonucleotide analogue probe hybrid- 
ize in the presence of a fluorescent indicator, in comparison 
to a oligonucleotide probe without oligonucleotide analogs- 

72. The method in accordance with claims 58-62, or 63, 
further comprising providing said plurality of oligonucle- 
otide analogue probes in an array with at least 1000 other 
oligonucleotide analogue probes. 
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NUCLEIC ACID ARRAYS possible pentapeptides of the 20 naturally occurring amino 

acids, for example, is 20^ or 3.2 million different peptides. 
This application is a continuation of Ser. No. 09/129,470 The likelihood that molecules of this size might be useful in 

filed Aug. 4, 1998, which is a continuation of Ser. No. receptor-binding studies is supported by epitope analysis 

08/456,598 filed Jun. 1, 1995, which is a divisional of Ser. 5 studies showing that some antibodies recognize sequences 

No. 07/954,646 filed Sep. 30, 1992, now issued as U.S. Pat. as short as a few amino acids with high specificity. 

No. 5,445,934, which is a divisional of Ser. No. 07/850,356, Furthermore, the average molecular weight of amino acids 

filed Mar, 12, 1992, now issued as U.S. Pat. No. 5,405,783, puts small peptides in the size range of many currenUy 

which is a divisional of Ser. No. 07/492,462 filed Mar. 7, useful pharaaaceutical products. 

1990, now issued as U.S, Pat. No. 5,143,854, which is a lO Pharmaceutical drug discovery is one type of research 

continuation-in-part of Ser. No. 07/362,901 filed Jun. 7, which relies on such a study of structure-activity relation- 

1989, now abandoned, the disclosures of which are incor- ships. In most cases, contemporary pharmaceutical research 

porated by reference. can be described as the process of discovering novel ligands 

with desirable patterns of specificity for biologically impor- 

COPYRIGHT NOTICE 15 (gnt receptors. Another example is research to discover new 

A portion of the disclosure of this patent document compounds for use in agriculture, such as pesticides and 

contains material which is subject to copyright protection. herbicides. 

The copyright owner has no objection to the facsimile Sometimes, the solution to a rational process of designing 

reproduction by anyone of the patent document or the patent ligands is difficult or unyielding. Prior methods of preparing 

disclosure as it appears in the Patent and Trademark Office 2° large numbers of different polymers have been painstakingly 

patent file or records, but otherwise reserves all copyright slow when used at a scale sufficient to permit effective 

rights whatsoever. rational or random screening. For example, the "Merrifield" 

method (/. Am. Chem. Soc. (1963) 85:2149-2154, which is 

BACKGROUND OF THE INVENTION incorporated herein by reference for all purposes) has been 

The present inventions relate to the synthesis and place- ^ used to synthesize peptides on a solid support. In the 

ment materials at known locations. In particular, one Merrifield method, an amino acid is covalently bonded to a 

embodiment of the inventions provides a method and asso- support made of an insoluble polymer. Another amino acid 

ciated apparatus for preparing diverse chemical sequences at with an alpha protected group is reacted with the covalendy 

known locations on a single substrate surface. The inven- bonded amino acid to form a dipeptide. After washing, the 

tioas may be applied, for example, in the field of preparation ^° protective group is removed and a third amino acid with an 

of oligomer, peptide, nucleic acid, oligosaccharide, alpha protective group is added to the dipeptide. This 

phospholipid, polymer, or drug congener preparation, espe- process is continued until a peptide of a desired length and 

cially to create sources of chemical diversity for use in sequence is obtained. Using the Merrifield method, it is not 

screening for biological activity. economically practical to synthesize more than a handful of 

The relationship between strucOire and activity of mol- P^Pt^de sequences in a day. 

ecules is a fundamental issue in the study of biological To synthesize larger numbers of polymer sequences, it has 

systems. Structure-activity relationships are important in also been proposed to use a series of reaction vessels for 

understanding, for example, the function of enzymes, the polymer synthesis. For example, a tubular reactor system 

ways in which cells communicate with each other, as weU as may be used to synthesize a Unear polymer on a solid phase 

cellular control and feedback systems. support by automated sequential addition of reagents. Hiis 

Certain macromolecules are known to interact and bind to ^^^^ ^tiU dc^s not enable the synthes^ of a sufficiently 

other molecules having a very specific three-dimensional '^^se number of polymer sequences for effective economical 

spatial and electronic distribution. Any large molecule hav- screemng. 

ing such specificity can be considered a receptor, whether it 45 Methods of preparing a plurality of polymer sequences 

is an enzyme catalyzing hydrolysis of a metabolic are also known in which a foraminous container encloses a 

intermediate, a ceU-surface protein mediating membrane known quantity of reactive particles, the particles being 

transport of ions, a glycoprotein serving to identify a par- larger in size than foramina of the container. The containers 

ticular ceU to its neighbors, an IgG-class antftjody circulat- may be selectively reacted with desired materials to synthe- 

ing in the plasma, an oligonucleotide sequence of DNA in 50 size desired sequences of product molecules. As with other 

the nucleus, or the like. The various molecules which methods known in the art, this method cannot practically be 

receptors selectively bind arc known as ligands. used to synthesize a sufficient variety of polypeptides for 

Many assays are available for measuring the binding effective screemng. 

affinity of known receptors and Ugands, but die information Other techniques have also been described. These meth- 

which can be gained from such experiments is often limited 55 ^^^^^^ synthesis of peptides on 96 plastic pins 

by the number and type of Ugands which are avaUable. which fit the format of standard microtiter plates. 

Novel ligands are sometimes discovered by chance or by Unfortunately, while these techniques have been somewhat 
application of new techniques for the elucidation of molecu-. useful, substantial problems remain. For example, these 

lar structure, including x-ray crystaUographic analysis and methods continue to be limited in the diversity of sequences 

recombinant genetic techniques for proteins. 60 which-can be economically synthesized and screened. 

Small peptides are an exemplary system for exploring the From the above, it is seen that an improved method and 

relationship between structure and function in biology. A apparatus for synthesizing a variety of chemical sequences 

peptide is a sequence of amino acids. When the twenty at known locations is desired, 

naturally occurring amino acids are condensed into poly- SUMMARY OF THE INVENTION 

menc moleoiles they form a wide vanety 01 three- 65 

dimensional configurations, each resulting from a particular An improved method and apparatus for the preparation of 

amino acid sequence and solvent condition. The number of a variety of polymers is disclosed. 
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In one prefeaed embodiment, linker molecules are pro- strate having a large variety of polymer sequences at known 
vided on a substrate. A terminal end of the linker molecules locations on a surface thereof. The substrate is exposed to a 
is provided with a reactive functional group protected with fluorescently labeled receptor which binds to one or more of 
a photorcmovable protective group. Using lithographic the polymer sequences. The substrate is placed in a micro- 
methods, the photoremovable protective group is exposed to 5 scope detection apparatus for identification of locations 
light and removed from the linker molecules in first selected where binding takes place. The microscope detection appa- 
regions. The substrate is then washed or otherwise contacted rams includes a monochromatic or polychromatic light 
with a first monomer that reacts with exposed functional source for directing light at the substrate, means for detect- 
groups on the linker molecules. In a preferred embodiment, ing fluoresced light from the substrate, and means for 
the monomer is an amino acid containing a photoremovable lo determining a location of the fluoresced light. The means for 
protective group at its amino or carboxy terminus and the detecting light fluoresced on the substrate may in some 
linker molecule terminates in an amino or carboxy acid embodiments include a photon counter. The means for 
group bearing a photoremovable protective group. determining a location of the fluoresced light may include an 

Aseojnd set of selected regions is, thereafter, exposed to x/y translation table for the substrate Translation of the sUde 
Ughl and the photoremovable protective group on the Unkcr 15 and data collection are recorded and managed by an appro- 
molecule/protected amino acid is removed at the second set pnately programmed digital computer, 
of regions. The substrate is then contacted with a second A further understanding of the nature and advantages of 
monomer containing a photoremovable protective group for the inventions herein may be realized by reference to the 
reaction with exposed functional groups. This process is remaining portions of the specification and the attached 
repeated to selectively apply monomers until polymers of a 20 drawings. 

desired length and desired chemical sequence are obtained. w^nrm? i^cconnrrr/^Kr ni? ttjt: CTr-Tionc 

Photolabile groups are then optionally removed and the ^^^^ DESCRIPTION OF THE HGURES 

sequence is, thereafter, optionally capped. Side chain pro- piG, 1 illustrates masking and irradiation of a substrate at 

tective groups, if present, are also removed. a first location. The substrate is shown in cross-section; 

By using the lithographic techniques disclosed herein, it ^ pjG. 2 illustrates the substrate after application of a 

is possible to direct light to relatively small and precisely monomer "A"; 

known locations on the substrate. It is, therefore, possible to pjQ 3 illustrates irradiation of the substrate at a second 

synthesize polymers of a known chemical sequence at location; 

known. locations on the substrate. pjQ 4 jjiustrates the substrate after application of mono- 

The resulting substrate will have a variety of uses mer "B"; 

including, for example, screening large numbers of poly- 5 illustrates irradiation of the «A" monomer; 

mers for biological activity. To screen for biological activity, ^ .^^^^^^^^^^ ^^^^^^^^ ^^^^ ^ ^^^^^ application 

the substrate is exposed to one or more receptors such as ,,g„ 

antibody whole cells, receptors on vesicles, lipids, or any ,c .„ . . , 

one of a variety of other receptors. The receptors are ^IG. 7 illustrates a completed substrate; 

preferably labeled with, for example, a fluorescent marker, FIGS. 8A and 8B illustrate alternative embodunents of a 

radioactive marker, or a labeled antibody reactive with the reactor system for forming a plurality of polymers on a 

receptor. The location of the marker on the substrate is substrate; 

detected with, for example, photon detection or autoradio- ^ FIG. 9 illustrates a detection apparatus for locating fluo- 

graphic techniques. Through knowledge of the sequence of rescent markers on the substrate; 

the material at the location where binding is detected, it is FIGS. lOA-lOM illustrate the method as it is applied to 

possible to quickly determine which sequence binds with the the production of the trimers of monomers "A" and "B"; 

receptor and, therefore, the technique can be used to screen pjQs ^nd IIB are fluorescence traces for standard 

large numbers of peptides. Other possible applications of the fluorescent beads; 

inventions herein include diagnostics in which various anti- ^^B are fluorescence curves for NVOC 

bodies for particular receptors would be placed on a sub- ^^^^^ ^ ^ ^ ^gl^^ respectively; 

strate and, for example, blood sera would be screened tor , ^c^ua^^ ^^^^^a 

immune deficiencies SlDl further applications include, for ,^ ^^f }^'^ ,"n H^n^™ J^W. ^ 

example, selective "doping" of oigaL materials in semi- 'brough 100 m^, 50 /«n. 20 /«n and 10 |«n masks^ 

conductor devices, and the like. 'V ^0- "A and MB dlustrates fluorescence of a slide pith 

, . . thepepttde YGGFLon selected regions of-its surface which 

In connection with one aspect of the ">vemion an ^ 

improved reactor system for synthesjzmg polymers is also «• 

disclosed. The reactor system includes a subsUate mount ' . j,.,^.„ . . * ,• t j n , 

which engages a substrate around a periphery thereof. TTie 55 ISA and ISD dlustmte formation of and a fluores- 

substrate mount provides for a leaclor space between the Pl°L°f » shde with a checkerboard pauera of YGGFL 

substrateandthemountthroughorintowhichreactionfluids GGFL expo^d to labeled Herz antibody. FIG. ISA 

arepumpedorflowed.Amaskisplaoedonorfocusedonthe illustrates a 500x500 /mi mask which h^been^^^^ 

substrate and illuminated so as to deprotect selected regions """utetrate according to FIG. SAwhi e FIG 15B Uli^trates 

of the substrate in the reactor space. A monomer is pumped «, a50x50Am inadcplac«l in direct contact with the substrate 

through the reactor space or otherwise oonUcted with the m accord with FIG. SB; 

substrate and reacts with the deprotectcd regions. By selec- FIG. 16 is a fluorescence plot of YGGFL and PGGFL 

tively deprotecting regions on the substrate and flowing synthesized in a 50 /on checkerboard pattern; 

predetermined monomers through the reactor space, desired FIG. 17 is a fluorescence plot of YPGGFL and is YGGFL 

polymers at known locations may be synthesized. $5 synthesized in a 50 fan checkerboard pattern; 

Improved detection apparatus and methods are also dis- FIGS. 18A and 18B illustrate the mapping of sixteen 

closed. The detection method and apparatus utilize a sub- sequences synthesized on two different glass slides; 
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FIG. 19 is a fluorescence plot of the slide illustrated io 
FIG. 18A; and 

FIG. 20 is a fluorescence plot of the slide illustrated in 
HG. lOB. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 
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1. Glossary 

The following terms are intended to have the following 
general meanings as they are used herein: 

1. Complementary: Refers to the topological compatibility 
or matching together of interacting surfaces of a ligand 
molecule and its receptor. Thus, the receptor and its ligand 50 
can be described as complementary, and furthermore, the 
contact surface characteristics are complementary to each 
other. 

2. Epitope: The portion of an antigen molecule which is 
delineated by the area of interaction with the subclass of 55 
receptors known as antibodies. 

3. Ligand: A ligand is a molecule that is recognized by a 
particular receptor. Examples of ligands that can be inves- 
tigated by this invention include, but are not restricted to, 
agonists and antagonists for cell membrane receptors, 60 
toxins and venoms, viral epitopes, hormones (e.g., 
opiates, steroids, etc.), hormone receptors, peptides, 
enzymes, enzyme substrates, cofactors, drugs, lectins, 
sugars, oligonucleotides, nucleic acids, oligosaccharides, 
proteins, and monoclonal antibodies. 65 

4. Monomer: A member of the set of small molecules which 
can be joined together to form a polymer. The set of 



monomers includes but is not restricted to, for example, 
the set of common L-amino acids, the set of D-amino 
acids, the set of synthetic amino acids, the set of nucle- 
otides and the set of pentoses and hexoses. As used herein, 
monomers refers to any member of a basis set for syn- 
thesis of a polymer. For example, dimers of L-amino acids 
form a basis set of 400 monomers for synthesis of 
polypeptides. Different basis sets of monomers may be 
used at successive steps in the synthesis of a polymer. 

5. Peptide: A polymer in which the monomers are alpha 
amino acids and which are joined together through amide 
bonds and alternatively referred to as a polypeptide. In the 
context of this specification it should be appreciated that 
the amino acids may be the L-optical isomer or the 
D-optical isomer. Peptides are more than two amino acid 
monomers long, and often more than 20 amino acid 
monomers long. Standard abbreviations for amino acids 
are used (e.g., P for proline). These abbreviations are 
included in Stryer, Biochemstry, Third Ed., 1988, which is 
incorporated herein by reference for all purposes. 

6. Radiation: Energy which may be selectively applied 
including energy having a wavelength of between 10"^* 
and 10* meters including, for example, electron beam 
radiation, gamma radiation, x-ray radiation, ultra-violet 
radiation, visible light, infrared radiation, microwave 
radiation, and radio waves. "Irradiation" refers to the 
application of radiation to a surface. 

7. Receptor: A molecule that has an affinity for a given 
ligand. Receptors may be naturally-occuring or manmade 
molecules. Also, they can be employed in their unaltered 
state or as aggregates with other species. Receptors may 
be attached, covalently or noncovalently, to a binding 
member, either directly or via a specific binding sub- 
stance. Examples of receptors which can be employed by 
this invention include, but are not restricted to, antibodies, 
cell membrane receptors, monoclonal antibodies and anti- 
sera reactive with specific antigenic determinants (such as 
on viruses, cells or other materials), drugs, 
polynucleotides, nucleic acids, peptides, cofactors, 
lectins, sugars, polysaccharides, cells, cellular 
membranes, and organelles. Receptors are sometimes 
referred to in the art as anti-ligands. As the term receptors 
is used herein, no difference in meaning is intended. A 
"Ligand Receptor Pair" is formed when two macromol- 
ecules have combined through molecular recognition to 
form a complex. 

Other examples of receptors which can be investigated by 
this invention include but are not restricted to; 

a) Microorganism receptors: Determination of ligands 
which bind to receptors, such as specific transport 
proteins or enzymes essential to survival of 
microorganisms, is useful in a new class of antibiotics. 
Of particular value would be antibiotics against oppor- 
tunistic fungi, protozoa, and those bacteria resistant to 
the antibiotics in current use. 

b) Enzymes: For instance, the binding site of enzymes 
such as the enzymes responsible for cleaving neu- 
rotransmitters; determination of ligands which bind to 
certain receptors to modulate the action of the enzymes 
which cleave the different neurotransmitters is useful in 
the development of drugs which can be used in the 
treatment of disorders of neurotransmission. 

c) Antibodies: For instance, the invention may be useful 
in investigating the ligand-binding site on the antibody 
molecule which combines with the epitope of an anti- 
gen of interest; determining a sequence that mimics an 
antigenic epitope may lead to the development of 



m 
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vaccines of which the immunogea is based on one or of uniform sequence. Such characteristics will typicaUy 

more of such sequences or lead to the development of be measured by way of binding with a selected ligand or 

related diagnostic agents or compounds useful in thera- receptor. 

peutic treatments such as for aulo-immune diseases jj General 

(cj by blocking the binding of the "seir ^^^^^^ s ^^^^^^^ ^^.^^ ^^^^^ apparatus for 

d) Nucleic Acids: Sequences of micleic acids rnay be ^ , substrate having a plurality of 
synthesized to establish DNA or RNA bindmg P ^^^^^^ predefined regions. TTie invention is 
sequences. described herein primarily with regard to the preparation of 

e) Catalytic Polypeptides: Polymers, preferably njoiecules containing sequences of amino acids, but could 
polypeptides, which are capable of promolmg a chemi- lo ^^^^jy applied in the preparation of other polymers. Such 
cal reaction involving the conversion of one or more polymers include, for example, both linear and cyclic poly- 
reactants to one or more products. Such polypepudes ^^^^^^^ ^^^^ polysaccharides, phospholipids, and 
generally include a binding site specific for at least one peptides having either a-, p-, or w-amino acids, hetero- 
rcactant or reaction intermediate and an active func- polymers in which a known drug is covalently bound to any 
tionality proximate to the binding site, which function- 15 ^^^^^^ polyurethanes, polyesters, polycarbonates, 
aUty is capable of chemically modifymg the bound poiyu^as, polyamides, polyethyleneimines, polyarylene 
reactant. Catalytic polypeptides are described in. for sulfides, polysUoxanes, polyimides, polyacetates, or other 
example, U.S. application Ser. No. 404,920, which is polymers which will be apparent upon review of this dis- 
incorporated herein by reference for all purposes. closure. In a preferred embodiment, the invention herein is 

f) Hormone receptors: For instance, the receptors for 20 jn the synthesis of peptides. 

insulin and growth hormone. Determination of the prepared substrate may, for example, be used in 

ligands which bind with high affinity to a receptor is sheening a variety of polymers as ligands for binding with 

useful in the development of, for example, an oral ^ receptor, although it will be apparent that the invention 

replacement of the daily injections which diabetics could be used for the synthesis of a receptor for binding with 

must take to relieve the symptoms of diabetes, and in 25 ^ ^^^^^ substrate disclosed herein will have a wide 

the other case, a replacement for the scarce human ^^^^y ^^^^^ j^^^^jy ^^y example, the 

growth hormone which can only be obtained from invention herein can be used in determining peptide and 

cadavers or by recombinant DNA technology Other ^^^j^.^ ^^-^ sequences which bind to proteins, finding 

examples are the vasoconstrictive hormone receptors; sequence-specific binding drugs, identifying epitopes rec- 

determination of those Ugands which bmd to a receptor 30 ^^^^ by antibodies, and evaluation of a variety of drugs 

may lead to the development of drugs to control blood ^^^.^^^ diagnostic applications, as weU as combi- 

prcssure. nations of the above. 

g) Opiate receptors: Determination of Ugands which bmd invebtion preferably provides for the use of a sub- 
10 the opiate receptors in the brain is useful m the ^^^^ «g„ ^ surface. Linker molecules "L" are option- 
development of less-addictive replacements for mor- 35 ^^y pj^^^^gj ^ surface of the substrate. The purpose of 
phine and related drugs. ^j^^ j^^^. molecules, in some embodiments, is to facilitate 

8. Substrate: A material having a rigid or semi-ngid surface. Kceptor recognition of the synthesized polymers. 
Inmanyembodiments,atleastonesurfaceof thesubstrate QptionaUy. the Unker molecules may be chemically pro- 
will be substantiaUy flat, although m some embodiments ^^J^ ^ ^ ^^^^.^^ ^ p^^^^^^j^, 
it may be desirable to physically separate synthesis 40 ^ f-BOC (t-butoxycarf)onyl) may be used in 
regions for different polymers with for example wells. ^^e embodiments. Such chemical protective groups would 
raised regions, etched trenches or the hke. Acxordmg to chemically removed upon exposure to. for example, 
other embodiments, small beads may be provided on the ^^.^^ ^^^^ .^^ ^^^j ^ ^^^^^^ ^^^^^ ^^^^^ 
surface which may be released upon completion of the ^^^^^^ ^^^^^^ ^^^^ p^^y^^^ preparation. 

synthesis. , • . • 1 ^ « r«r.n« On the substrate or a distal end of the linker molecules, a 

9. Protective Group: A material which is bound io.mono- ^ ^^^^^^ 

mer unit and which may be spaUally removed upon J^^^^^^^^^ ^ p J ^ ,^Zwc^ upon exposure to 

selective exposure to an activator such as elecUx)magnelic projccuvc group rp y nVnther activators 

radiation. Examples of protective groups with utility radiation, electnc fields electnc currents, ^ 

herein include Nitroveratryloxy carbonyl, Nitrobenzyloxy 50 f^P^ funcUonal group. 

carbonyl. Dimethyl diiethoxybenzyloxy carbonyl. , "° ^. ^^'l^n.^^'^^rr m\ n^^^^ 

5.Bromo.7.nitroindolinyl. o-Hydroxy-a-methyl ^^J^^^^ (1^^)^°^ ^^^^^^^^ 

cinnamoyl, and 2.oxymethylene anthraquinone. Other below, the protective group may '^'^'l'^^^^^^^ 

examples of activator/include ion beams, electric fields, eleclrochemically-sensitive f o^JP ^^h^^^h may be remove^^^^ 

magnetic fields, electron beams, x-ray, and the like. 55 the presence of an electnc field. In sUU fi^^^^/ J^"^^^^^^^ 

10. Predefined Region: A predefined region is a localized embodmients. 100 beams, electron beams, or the like may be 
area on a surface which is, was, or is intended to be used for deprotecUon. ^ ^ , 
activated for formation of a polymer. The predefined In some embodiments, the exposed regions and. therefore, 
region may have any convenient shape, e.g., circular. the area upon which each distmct polymer sequence ^ 
rectangular, elliptical, wedge-shaped, etc. For the sake of 60 synthesized are smaUer than about 1 cm or less than 1 inm . 
brevity herein, "predefined regions'* are sometimes In preferred embodiments the exposed area is less than about 
referred to simply as "regions." 10.000^^ or, more preferably, less than 100 /mi and may, 

11. Substantially Pure: A polymer is considered to be "sub- in some embodiments, encompass the binding site for as few 
stantially pure" within a predefined region of a substrate as a single molecule. Within these regions, each polymer is 
when it exhibits characteristics that disUnguish it from 65 preferably synthesized in a substanUally pure form- 
other predefined regions. Typically, purity will be mea- Concurrently or after exposure of a known region of the 
surcd in terms ofbiological activity or function as a result substrate to light, the surface is contacted with a first 
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monomer unit which reacts with the functional group 
which has been exposed by the deprotection step. The first 
monomer includes a protective group Pj. Pj may or may not 
be the same as Pq. 

Accordingly, after a first cycle, known first regions of the 
surface may comprise the sequence: 

S-L-Mj-Pi 

while remaining regions of the surface comprise the 
sequence: 

S-L-Po. 

Thereafter, second regions of the surface (which may 
include the first region) are exposed to light and contacted 
with a second monomer Mj (which may or may not be the 
same as M J having a protective group Pj. Pj may or may 
not be the same as Pq and P^. After this second cycle, 
different regions of the substrate may comprise one or more 
of the following sequences: 

S-L-M1-M2-P3 

S-UMj-Pi 
S-L-Mi-Pj and/or 
S-L-Po- 

The above process is repeated until the substrate includes 
desired polymers of desired lengths. By controlling the 
locations of the substrate exposed to light and the reagents 
exposed to the substrate following exposure, the location of 
each sequence will be known. 

Thereafter, the protective groups are removed from some 
or all of the substrate and the sequences are, optionally, 
capped with a capping imit C. The process results in a 
substrate having a surface with a plurality of polymers of the 
following general formula: 

S.[L]-(M>(M,)-(M^...(M>[C] 

where square brackets indicate optional groups, and M,- . . . 
Mj^ indicates any sequence of monomers. The number of 
monomers could cover a wide variety of values, but in a 
preferred embodiment they will range from 2 to 100. 

In some embodiments a plurality of locations on the 
substrate polymers are to contain a common monomer 
subsequence. For example, it may be desired to synthesize 
a sequence S-M1-M2-M3 at first locations and a sequence 
S-M4-MJ-M3 at second locations. The process would com- 
mence with irradiation of the first locations followed by 
contacting with M^-P, resulting in the sequence S-M^-P at 
the first location. The second locations would then be 
irradiated and contacted with M4-P, resulting in the sequence 
S-M4-P at the second locations. Thereafter both the first and 
second locations would be irradiated and contacted with the 
dimer M2-M3, resulting in the sequence S-MJ-M2-M3 at the 
first locations and S-M4-M2-M3 at the second locations. Of 
course, common subsequences of any length could be uti- 
lized including those in a range of 2 or more monomers, 2 
to 100 monomers, 2 to 20 monomers, and a most preferred 
range of 2 to 3 monomers. 

According to other embodiments, a set of masks is used 
for the first monomer layer and, thereafter, varied light 
wavelengths are used for selective deprotection. For 
example, in the process discussed above, first regions are 
first exposed through a mask and reacted with a first mono- 
mer having a first protective group P^, which is removable 
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Upon exposure to a first wavelength of light (e.g., IR). 
Second regions are masked and reacted with a second 
monomer having a second protecive group Pj, which is 
removable upon exposure to a second wavelength of light 

5 (e.g., UV). Thereafter, masks become unnecessary in the 
synthesis because the entire substrate may be exposed 
alternatively to the first and second wavelengths of light in 
the deprotection cycle. 
The polymers prepared on a substrate according to the 

10 above methods will have a variety of uses including, for 
example, screening for biological activity. In such screening 
activities, the substrate containing the sequences is exposed 
to an unlabeled or labeled receptor such as an antibody, 
receptor on a cell, phospholipid vesicle, or any one of a 

15 variety of other receptors. In one preferred embodiment the 
polymers are exposed to a first, unlabeled receptor of interest 
and, thereafter, exposed to a labeled receptor-specific rec- 
ognition element, which is, for example, an antibody. This 
process will provide signal amplification in the detection 

20 stage. 

The receptor molecules may bind with one or more 
polymers on the substrate. The presence of the labeled 
receptor and, therefore, the presence of a sequence which 
binds with the receptor is detected in a preferred embodi- 

25 ment through the use of autoradiography, detection of fluo- 
rescence with a charge-coupled device, fluorescence 
microscopy, or the like. The sequence of the polymer at the 
locations where the receptor binding is detected may be used 
to determine all or part of a sequence which is complemen- 

30 tary to the receptor. 

Use of the invention herein is illustrated primarily with 
reference to screening for biological activity. The invention 
will, however, find many other uses. For example, the 
invention may be used in information storage (e.g., on 

35 optical disks), production of molecular electronic devices, 
production of stationary phases in separation sciences, pro- 
duction of dyes and brightening agents, photography, and in 
immobilization of ceUs, proteins, lectins, nucleic adds, 
polysaccharides and the like in patterns on a surface via 

40 molecular recognition of specific polymer sequences. By 
synthesizing the same compound in adjacent, progressively 
differing concentrations, a gradient will be established to 
control chemotaxis or to develop diagnostic dipsticks which, 
for example, titrate an antibody against an increasing 

45 amount of antigen. By synthesizing several catalyst mol- 
ecules in close proximity, more efficient multistep conver- 
sions may be achieved by "coordinate immobilization." 
Coordinate immobilization also may be used for electron 
transfer systems, as well as to provide both structural 

50 integrity and other desirable properties to materials such as 
lubrication, wetting, etc. ^ 

According to alteraalive embodiments, molecular biodis- 
tribution or pharmacokinetic properties may be examined. 
For example, to assess resistance to intestinal or serum 

55 proteases, polymers may be capped with a fluorescent tag 
and exposed to biological fluids of interest. 

lU. Polymer Synthesis 

FIG. 1 illustrates one embodiment of the invention dis- 
60 closed herein in which a substrate 2 is shown in cross- 
section. Essentially, any conceivable substrate may be 
employed in the invention. The substrate may be biological, 
nonbiological, organic, inorganic, or a combination of any of 
these, existing as particles, strands, precipitates, gels, sheets, 
65 tubing, spheres, containers, capillaries, pac^, slices, films, 
plates, slides, etc. The substrate may have any convenient 
shape, such as a disc, square, sphere, circle, etc. The 



us 6,261,776 Bl 

11 12 

subslrale is preferably flat but may take on a variety of bonds (using, for example, glass or silicon oxide surfaces), 

alternative surface configurations. For example, the sub- Siloxane bonds with the surface of the subsU-ale may be 

strate may contain raised or depressed regions on which the formed in one embodiment via reactions of linker molecules 

synthesis takes place. The substrate and its surface prefer- bearing irichlorosilyl groups. The linker molecules may 
ably form a rigid support on which to carry out the reactions 5 optionally be attached in an ordered array, i^.. as parte of the 

described herein. THe substrate and its surface is also chosen head groups in a Polymenzed Langmijir Blodgetl film^ In 
to provide appropriate light-absorbing characteristics. For . alternative embodiments^ he Imker molecules are adsorbed 

insiance, the substrate may be a polymerized Langmuir ^.^f^^ ^^^"^^^ - 

Blodgett film, funclionalized glass. Si, Ge, GaAs, GaP, SiO„ The hnker molecules and monorneis iised herein are 

o-xT j-c J f -^^ ^f«*ic provided with a functionial group to which is bound a 

SiN,, modified sibcon, or any one of a wide v n^^^^ ofgds io J P ^^^^.^^ ^„ 

or polymers such as (poly)tetrafiuoroethy lene, (poly) P^^^^ ^ S^. P^ ^^^^ ^^^^J^ ^^^^^^^ 

vmyhdenedifluoride, polystyrene, polycarbona e, or cornbi- s^bst^atc. The protective group may be either a negative 

nations thereof. Other substrate materials will be readily protective group (i.e.. the protective group renders the linker 

apparent to those of skiU in is the art upon review of this molecules less reactive with a monomer upon exposure) or 
disclosure. In a preferred embodiment the substrate is flat i5 a positive protective group (i.e., the protective group renders 

glass or single-crystal silicon with surface relief features of j^e linker molecules more reactive with a monomer upon 

less than 10 A. exposure). In the case of negative protective groups an 

According to some embodiments, the surface of the additionsJ step of reactivation will be required. In some 

substrate is etched using well known techniques to provide embodiments, this will be done by heating, 
for desired surface features. For example, by way of the 20 The protective group on the linker molecules may be 

formation of trenches, v-grooves, mesa structures, or the selected from a wide variety of positive light-reactive groups 

like, the synthesis regions may be more closely placed preferably including nitro aromatic compounds such as 

within the focus point of impinging light, be provided with o-nitrobenzyl derivatives or benzylsulfonyl. In a preferred 

reflective "mirror" structures for maximization of light col- embodiment, 6-nitroveratryloxy-carbonyl (NVOC), 
lection from fluorescent sources, or the like. 25 2-nitrobenzyloxycarbonyl (NBOC) or a,a-dimethyl- 

Surfaces on the solid substrate will usually, though not dimethoxybenzyloxycarbonyl (DDZ) is used. In one 

always, be composed of the same material as the substrate. embodiment, a nitro aromatic compound containing a ben- 

Thus, the surface may be composed of any of a wide variety zylic hydrogen ortho to the nitro group is used, i.e., a 

of materials, for example, polymers, plastics, resins, chemical of the form: 
polysaccharides, silica or silica-based materials, carbon, 
metals, inorganic glasses, membranes, or any of the above- 
listed substrate materials. In some embodiments the surface 
may provide for the use of caged binding members which 
are attached firmly to the surface of the substrate in accord 
with the teaching of copending application Ser. No. 404,920, 
previously incorporated herein by reference. Preferably, the 
surface will contain reactive groups, which could be 
carboxyl, amino, hydroxyl. or the like. Most preferably, the 
surface will be optically transparent and will have surface 
Si — OH functionalities, such as are found on silica surfaces. 

The surface 4 of the substrate is preferably provided with 40 where is alkoxy. alkyl, halo, aryl, alkenyl, or hydrogen; 

a layer of Unker molecules 6, although it wiU be understood R2 is alkoxy, alkyl. halo, aryl, mtro, or hydrogen; R3 is 

that the linker molecules are not required elements of the alkoxy, alkyl, halo, mtro, aryl, or hydrogen; R4 is alkoxy, 

invention. The linker molecules are preferably of sufB- alkyl. hydrogen, aryl, halo, or nitro; and R5 is alkyl, alkynyl, 

cientlength to permit polymers in a completed substrate to cyano, alkoxy. hydrogen, halo, aryl, or alkenyl. Other mate- 
interact fi^eely with molecules exposed to the substrate. The 45 rials which may be used include o-hydroxy-a-methyl cin- 

linker molecules should be 6-50 atoms long to provide namoyl derivatives. Photoremovable protective groups are 

sufficient exposure. The linker molecules may be, for described in, for example, Patchomik, /. Am. Chem. Soc. 

example, aryl acetylene, ethylene glycol oligomers conUin- (1970) 92:6333 and Amit et al, 7. Org, Chem. (1974) 

ing 2-10 monomer units, diamines, diacids, amino acids, or 39:192, both of which are incorporated herein by reference, 

combinations thereof. Other linker molecules may be used ^j, alternative embodiment the positive reactive group 

in light of this disclsoure. is activated for reaction with reagents in solution. For 

According to alternative embodiments, the linker mol- example, a 5-bromo-7-niUo indoline group, when bound to 

ecules are selected based upon their hydrophilic/ a carboiiyl, undergoes reaction upon exposure to light at 420 

hydrophobic properties to improve presentation of synthe- nm. 

sized polymers to certain receptors. For example, in the case Iq a second altemativc embodiment, the reactive group on 

of a hydrophilic receptor, hydrophilic linker molecules will the linker molecule is selected from a wide variety of 

be preferred so as to permit the receptor to more closely negative light-reactive groups including a cinammate group, 

approach the synthesized polymer. Alternatively, the reactive group is activated or deacti- 

According to another alternative embodiment, linker mol- vated by electron beam lithography, x-ray lithography, or 

ecules are also provided with a photocleavable group at an any other radiation. Suitable reactive groups for electron 

intermediate position. The photocleavable group is prefer- beam lithography include sulfonyl. Other methods may be 

ably cleavable at a wavelength different from the protective used including, for example, exposure to a current source, 

group. This enables removal of the various polymers fol- Other reactive groups and methods of activation may be 

lowing completion of the synthesis by way of exposure to used in light of this disclosure. 

the different wavelengths of light. As shown in FIG. 1, the linking molecules are preferably 
The linker molecules can be attached to the substrate via 65 exposed to, for example, light through a suitable mask 8 

carbon-carbon bonds using, for example, (poly) using photolithographic techniques of the type known in the 

Uifluorochloroethylene surfaces, or preferably, by siloxane semiconductor industry and described in, for example, Sze, 
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VLSI Technology, McGraw-Hill (1983), and Mead et al.. 
Introduction to VLSI Systems, Addison-Wcsley (1980), 
which are incorporated herein by reference for ail purposes. 
The light may be directed al either the surface containing the 
protective groups or at the back of the substrate, so long as 
the substrate is transparent to the wavelength of light needed 
for removal of the protective groups. In the embodiment 
shown in . no. 1, light is directed at the surface of the 
substrate containing the protective groups. FIG. 1 illustrates 
the use of such masking techniques as they are applied to a 
positive reactive group so as to activate linking molecules 
and expose functional groups in areas 10a and 106. 

The mask 8 is in one embodiment a transparent support 
material selectively coated with a layer of opaque material. 
Portions of the opaque material are removed, leaving opaque 
material in the precise pattern desired on the substrate 
surface. The mask is brought into close proximity with, 
imaged on, or brought directly into contact with the substrate 
surface as shown in FIG. 1. "Openings" in the mask corre- 
spond to locations on the substrate where it is desired to 
remove photore movable protective groups from the sub- 
strate. /Uignment may be performed using conventional 
alignment techniques in which alignment marks (not shown) 
are used to accurately overlay successive masl» with pre- 
vious patterning steps, or more sophisticated techniques may 
be used. For example, interferometric techniques such as the 
one described in Flanders et al., "A New Interferometric 
Alignment Technique," App. Phys. Lett. (1977) 31:426-428, 
which is incorporated herein by reference, may be used. 

To enhance contrast of light applied to the substrate, it is 
desirable to provide contrast enhancement materials 
between the mask and the substrate according to some 
embodiments. This contrast enhancement layer may com- 
prise a molecule which is decomposed by light such as 
quinone diazid or a material which is transiently bleached at 
the wavelength of interest. Transient bleaching of materials 
will allow greater penetration where light is applied, thereby 
enhancing contrast. Alternatively, contrast enhancement 
may be provided by way of a cladded fiber optic bundle. 

The light may be from a conventional incandescent 
source, a laser, a laser diode, or the like. If non-collimaled 
sources of light are used it may be desirable to provide a 
thick- or multi-layered mask to prevent spreading of the 
light onto the substrate. It may, further, be desirable in some 
embodiments to utilize groups which are sensitive to differ- 
ent wavelengths to control synthesis. For example, by using 
groups which are sensitive to different wavelengths, it is 
possible to select branch positions in the synthesis of a 
polymer or eliminate certain masking steps. Several reactive 
groups along with their corresponding wavelengths for 
deprotection are provided in Table 1. 



TABLE 1 





^proximate 


Gro^p 


Deprotection Wavelength 


Nitroveratryloxy caibonyl (NVOC) 


UV (300-400 nm) 


NiUobcnzytoxy carbonyl (NBOC) 


UV (300-350 nm) 


Dimethyl dimethoxybcnzyloxy carbonyl UV (280-300 nm) 


5 - Bromo-7-nitroindolin yl 


LTV (420 nm) 


o-Hydroxy-a-metbyl cinnamoyl 


UV (300-350 nm) 


2-OxymclhyIcne anthraquinone 


UV (350 nm) 



While the invention is illustrated primarily herein by way 
of the use of a mask to illuminate selected regions the 
stibstrate, other techniques may also be used. For example, 
the substrate may be translated under a modulated laser or 
diode light so.urce. Such techniques are discussed in, for 
example, U.S. Pat. No. 4,719,615 (Feyrer et al.). which is 
incorporated herein by reference. In alternative embodi- 
ments a laser galvanometric scanner is utilized. In other 
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embodiments, the synthesis may take place on or in contact 
with a conventional liquid crystal (referred to herein as a 
"light valve**) or fiber optic light sources. By appropriately 
modulating liquid crystals, light may be selectively con- 
trolled so as to permit light to contact selected regions of the 
substrate. Alternatively, synthesis may take place on the end 
of a series of optical fibers to which light is selectively 
applied. Other means of controlling the location of light 
exposure will be apparent to those of skill' in the art. 
The substrate may be irradiated either in contact or not in 

10 contact with a solution (not shown) and is, preferably, 
irradiated in contact with a solution. The solution contains 
reagents to prevent the by-products formed by irradiation 
from interfering with synthesis of the polymer according to 
some embodiments. Such by-products might include, for 

j5 example, carbon dioxide, nitrosocarbonyl compounds, sty- 
rene derivatives, indole derivatives, and products of their 
photochemical reactions. Alternatively, the solution may 
contain reagents used to match the index of refraction of the 
substrate. Reagents added to the solution may further 
include, for example, acidic or basic buffers, thiols, substi- 

^° tuted hydrazines and hydroxylamines, reducing agents (e.g., 
NADH) or reagents known to react with a given functional 
group (e.g., aryl nitroso+glyoxylic acid-»aryl 
formhydroxamate+COJ. 
Either concurrently with or after the irradiation step, the 

25 linker molecules are washed or otherwise contacted with a 
first monomer, illustrated by "A" in regions 12a and 12b in 
FIG. 2. The first monomer reacts with the activated func- 
tional groups of the linkage molecules which have been 
exposed to light. The first monomer, which is preferably an 

30 amino acid, is also provided with a photoprotective group. 
The photoproteclive group on the monomer may be the same 
as or different than the protective group used in the linkage 
molecules, and may be selected from any of the above- 
described protective groups. In one embodiment, the pro- 

35 tective groups for the A monomer is selected from the group 
NBOC and NVOC. 

As shown in FIG. 3, the process of irradiating is thereafter 
repeated, with a mask repositioned so as to remove linkage 
protective groups and expose functional groups in regions 
14a and 14b which are illustrated as being regions which 

^ were protected in the previous masking step. As an alterna- 
tive to repositioning of the first mask, in many embodiments 
a second mask will be utilized. In other alternative 
embodiments, some steps may provide for illuminating a 
common region in successive steps. As shown in FIG. 3, it 

45 may be desirable to provide separation between irradiated 
regions. For example, separation of about 1-5 /ma may be 
appropriate to account for alignment tolerances. 

As shown in FIG. 4, the substrate is then exposed to a 
second protected monomer "B," producing B regions 16a 

50 and 16b. Thereafter, the substrate is again masked so as to 
remove the protective groups and expose reactive groups on 
A region 12a and B region 16b, The substrate is again 
exposed to monomer B, resulting in the formation of the 
structure shown in FIG. 6. The dimers B-A and B-B have 

J J been produced on the substrate. 

A subsequent series of masking and contacting steps 
simflar to those described above with A (not shown) pro- 
vides the stmcture shown in FIG. 7. The process provides all 
possible dimers of B and A, i.e., B-A, A-B, A-A, and B-B. 

The substrate, the area of synthesis, and the area for 
synthesis of each individual polymer could be of any size or 
shape. For example, squares, ellipsoids, rectangles, 
triangles, circles, or portions thereof, along with inegular 
geometric shapes, may be utOized. Duplicate synthesis areas 
may also be applied to a single substrate for purposes of 

65 redundancy. 

In one embodiment the regions 12 and 16 on the substrate 
will have a surface area of between about 1 cra^ and 10"^° 
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cm^. Id some embodiments the regions 12 and 16 have areas 
of less than about 10"^ cm^, 10"^ cm\ 10"^ cm^. 10"^ cm^, 
10-^ cm^ 10-* cm^ 10'^ cm^ 10'^ cm^ or 10"'° cm^ In 
a preferred embodiment, the regions 12 and 16 are between 
about 10x10 /an and 500x500 fm. s 

In some embodiments a single substrate supports more 
than about 10 different monomer sequences and perferably . 
more than about 100 different monomer sequences, although 
in some embodiments more than about 10^ 10^, 10^, 10*, 
10\ or 10^ different sequences are provided on a substrate. 
Of course, within a region of the substrate in which a 
monomer sequence is synthesized, it is preferred that the 
monomer sequence be substantially pure. In some 
embodiments, regions of the substrate contain polymer 
sequences which are at least about 1%, 5%, 10%, 15%, 20%, 
25%, 30%, 35%. 40%, 45%, 50%, 60%. 70%, 80%. 90%, 
95%, 96%, 97%, 98%, or 99% pure. 

According to some embodiments, several sequences are 
intentionally provided within a single region so as to provide 
an initial screening for biological activity, after which mate- 
rials within regions exhibiting significant binding are further 20 
evaluated. 

IV Details of One Embodiment of a Reactor 
System 

FIG. 8A schematically illustrates a preferred embodiment 
of a reactor system 100 for synthesizing polymers on the 
prepared substrate in accordance with one aspect of the 
invention. The reactor system includes a body 102 with a 
cavity 104 on a surface thereof. In preferred embodiments 
the cavity 104 is between about 50 and 1000 ;<m deep with 30 
a depth of about 500 /an preferred. 

The bottom of the cavity is preferably provided with an 
array of ridges 106 which extend both into the plane of the 
Figure and parallel to the plane of the Figure. The ridges are 
preferably about 50 to 200 deep and spaced at about 2 35 
to 3mm. The purpose of the ridges is to generate turbulent 
flow for better mixing. The bottom surface of the cavity is 
preferably light absorbing so as to prevent reflection of 
impinging light. 

A substrate 112 is mounted above the cavity 104. The 40 
substrate is provided along its bottom surface 114 with a 
photoremovable protective group such as NVOC with or 
without an intervening linker molecule. The substrate is 
preferably transparent to a wide spectrum of light, but in 
some embodiments is transparent only at a wavelength at 45 
which the protective group may be removed (such as UV in 
the case of NVOC). The substrate in some embodiments is 
a conventional microscope glass slide or cover slip. The 
substrate is preferably as thin as possible, while still pro- 
viding adequate physical support. Preferably, the substrate is 
less than about 1 mm thick, more preferably less than 0.5 
mm thick, more preferably less than 0.1 mm thick, and most 
preferably less than 0.05 mm thick. In alternative preferred 
embodiments, the substrate is quartz or silicon. 

The substrate and the body serve to seal the cavity except 
for an inlet port 108 and an outlet port 110. The body and the 
substrate may be mated for sealing in some embodiments 
with one or more gaskets. According to a preferred 
embodiment, the body is provided with two concentric 
gaskets and the intervening space is held at vacuum to 
ensure mating of the substrate to the gaskets. 

Fluid is pumped through the inlet port into the cavity by 
way of a pump 116 which may be, for example, a model no. 
B-120-S made by Eldex Laboratories. Selected fluids are 
circulated into the cavity by the pump, through the cavity, 
and out the outlet for recirculation or disposal. The reactor 65 
may be subjected to ultrasonic radiation and/or heated to aid 
in agitation in some embodiments. 
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Above the substrate 112, a lens 120 is provided which 
may be, for example, a 2" 100 mm focal length fused silica 
lens. For the sake of a compact system, a reflective mirror 
122 may be provided for directing light from a light source 
124 onto the substrate. Light source 124 may be, for 
example, a Xe(Hg) light source manufactured by Oriel and 
having model no. 66024. A second lens 126 may be provided 
for the purpose of projecting a mask image onto the substrate 
in combination with lens 112. This form of lithography is 
referred to herein as projection printing. As will be apparent 
from this disclosure, proximity printing and the like may 
also be used according to some embodiments. 

Light from the light source is permitted to reach only 
selected locations on the substrate as a result of mask 128. 
Mask 128 may be, for example, a glass slide having etched 
chrome thereon. The mask 128 in one embodiment is 
provided with a grid of transparent locations and opaque 
locations. Such masks may be manufactured by, for 
example. Photo Sciences, Inc. Light passes freely through 
the transparent regions of the mask, but is reflected from or 
absorbed by other regions. Therefore, only selected regions 
of the substrate are exposed to light. 

As discussed above, light valves (LCD's) may be used as 
an alternative to conventional masks to selectively expose 
regions of the substrate. Alternatively, fiber optic faceplates 
such as those available from Schott Glass, Inc. may be used 
for the purpose of contrast enhancement of the mask or as 
the sole means of restricting the region to which light is 
applied. Such faceplates would be placed directly above or 
on the substrate in the reactor shown in FIG. 8 A. In still 
further embodiments, flys-eye lenses, tapered fiber optic 
faceplates, or the like, may be used for contrast enhance- 
ment. 

In order to provide for illumination of regions smaller 
than a wavelength of light, more elaborate techniques may 
be utilized. For example, according to one preferred 
embodiment, light is directed at the substrate by way of 
molecular microcrystals on the tip of, for example, micropi- 
pettes. Such devices are disclosed in Lieberman et al., "A 
Light Source Smaller Than the Optical Wavelength," Sci- 
ence (1990) 247:59-61, which is incorporated herein by 
reference for all purposes. 

In operation, the substrate is placed on the cavity and 
sealed thereto. All operations in the process of preparing the 
substrate are carried out in a room lit primarily or entirely by 
light of a wavelength outside of the light range at which the 
protective group is removed. For example, in the case of 
NVOC, the room should be lit with a conventional dark 
room light which provides little or no UV light. All opera- 
tions are preferably conducted at about room temperature. 

A first, deproteclion fluid (without a monomer) is circu- 
lated through the cavity. The solution preferably is of 5 mM 
sulfuric acid in dioxane solution which serves to keep 
exposed amino groups protonated and decreases their reac- 
tivity with photolysis by-products. Absorptive materials 
such as N,N-diethylamino 2.4-dinitrobenzene, for example, 
may be included in the deprotection fluid which serves to 
absorb light and prevent reflection and unwanted photolysis. 

The slide is, thereafter, positioned in a light raypath from 
the mask such that first locations on the substrate are 
illuminated and, therefore, deprotected. In preferred 
embodiments the substrate is illuminated for between about 
1 and 15 minutes with a preferred illumination lime of about 
10 minutes at 10-20 mW/cm^ with 365 nm light. The slides 
are neutralized (i.e., brought to a pH of about 7) after 
photolysis with, for example, a solution of 
di-isopropylethylamine (DIEA) in methylene chloride for 
about 5 minutes. 

The first monomer is then placed at the first locations on 
the substrate. After irradiation, the slide is removed, treated 
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in bulk, and then reinstalled in the flow cell. Alternatively, a solution of about 1% BSA (bovine serum albumin), 0.5% 

a fluid containing the first monomer, preferably also pro- Tween in PBS (phosphate buffered saline) buffer. The anti- 

lected by a protective group, is circulated through the cavity bodies are diluted into the supercocklail buffer to a final 

by way of pump 116. If, for example, it is desired to attach concentration of, for example, about 0.1 to 4 ^g/m\. 

the amino acid Y to the substrate at the first locations, the ^ gg Ulustrates an alternative preferred embodiment of 

amino acid Y (bearing a protecuve group on Its a-mtrogen), ^^^^^^^ ^^^^^ According to this 

along with reagents used to render the monomer reacUve, embodiment, the mask 128 is placed directly in contact with 

and/or a earner, is. arculated from a storage container 118 substrate. Preferably, the etched portion of the mask is 

through the pump, through the cavity, and back to the mlet ^^^^^ down so as to reduce the effects of light 

of the pump. r . lO dispersion. According to this embodiment, the imaging 

The monomer carrier solution is, in a preferred ^^^^^ and 126 are not necessary because the mask is 

embodiment, formed by mixing of a first solution (referred j^^^^j ^^^^ proximity with the substrate, 

to herein as solution "A") and a second soluUon (referred to • *• f*u« 

herein as solution "B"). Table 2 provides an illustration of a For purposes of mcreasing the sipal-to-noise ratio of the 

mixture which may be used for solution A. technique, some embodiments of the invention provide for 

15 exposure of the substrate to a first labeled or unlabeled 

TART F 1 receptor followed by exposure of a labeled, second receptor 

(e.g., an antibody) which binds at multiple sites on the first 

ReprescniaUvc Monomer Carrier SolutioQ "A" rcceptor. If, for example, the first reccptor is an anU*body 

derived from a first species of an animal, the second receptor 



100 mg Nvoc amino protected amino acid ^ antibody derived from a second species directed to 

'^5?:: DMF%"i^^^^^ «Pi^°I-^ the first species. In the case of a 

SMDiEApiisopropylcthylaminc) mousc antibody, for example, fluorescently labeled goat 

antibody or antiserum which is antimouse may be used to 

bind at multiple sites on the mouse antibody, providing 
The composition of solution B is illustrated in Table 3. several times the fluorescence compared to the attachment of 
Solutions A and B are mixed and allowed to react at room 25 ^ single mouse antibody at each binding site. This process 
temperature for about 8 minutes, then diluted with 2 ml of may be repeated again with additional antibodies (e.g., 
DMF, and 500 are applied to the surface of the slide or the goat-mouse-goat, etc.) for further signal amplification, 
solution is circulated through the reactor system ami allowed preferred embodiments an ordered sequence of masks 

to react for about 2 hours at room temperature, pe sUde is ^ ^^.^.^^ embodiments it is possible to use as few 

then washed with DMF, methylene chloride and ethanol. 30 ^ ^-^^^^ ^^y. synthesize all of the possible polymers 

of a given monomer set. 

TABLE 3 

• L '. If, for example, it is desired to synthesize all 16 dinucle- 

ReprescntaUve Monomer Carrier Solution "B" otides from four bases, a 1 cm Square synthesis region is 

~ divided conceptually into 16 boxes, each 0.25 cm wide. 



^, Denote the four monomer units by A, B,C, and D. The first 

111 mg BOP (Benzotriazolyl-i»-oxy-tns(dm.ethylam.no) ^ ^ j columns, each 0.25 

phosphoniumhexafluorophosphate) ivavuvu^ ai* *v , , ^ , , r 

*^ cm Wide, The first mask exposes the left-most column of 

boxes, where A is coupled. The second mask exposes the 

As the solution containing the monomer to be attached is next column, where B is coupled; followed by a third mask, 

circulated through the cavity, the amino acid or other mono- 40 fQj- the C column; and a final mask that exposes the right- 

mer will react at its carboxy terminus with amino groups on most column, for D. The first, second, third, and fourth 

the regions of the substrate which have been deprotected. Of masks may be a single mask translated to different locations, 

course, whfle the invention is illustrated by way of circula- ^^^^ ^ repeated in the horizontal direction for the 

tion of the monomer ti^ough the cavity, the mvenuon could second unit of the dimer. This time, the masks aUow 

be practiced by way of removmg the sUde from the reactor 45 ^^^osarc of horizontal rows, again 0,25 cm wide. A, B, C, 

and submersing it m an appropnate monomer solution. sequentiaUy coupled using masks that expose 

After addition of the first monomer, the solution contam- horizontal fourths of the reaction area. The resulting sub- 

ing the first amino acid is then purged from the system. After ^^^^^ contains all 16 dinucleotides of four bases, 

circulation of a suflBcient amount of the DMF/methylene i 

chloride such that removal of the amino acid can be assured 5^ J^^^ eight masks used to synmesize the dinucleotide are 

(e.g., about 50x times the volume of the cavity and carrier related to one another by translation or rotaUon. ^ fact one 

lines), the mask or substrate is repositioned, or a new mask mask can be used in aU eight steps if it is suitably rotated and 

is utUized such that second regions on the substrate wiU be translated. For example, m the example above a mask wiUi 

exposed to light and the Ught 124 is engaged for a second a smgle transparent region could be «quentiaUy used to 

exposure, ThSwilldeprotectsecondregionsonthesubstrate „ expose each of the vertical columns, translated 90 ,andlhen 

and the process is repeated until the desired polymer sequenUally used to allow exposure of the horizontal rows, 

sequences have been synthesized. Tables 4 and 5 provide a simple computer program in 

The entire derivalized substrate is then exposed to a Quick Basic for planning a masking program and a sample 

receptor of interest, preferably labeled with, for example, a output, respectively, for the synthesis of a polymer chain of 

fluorescent marker, by circulation of a solution or suspen- three monomers ("residues") having three different mono- 

sion of the receptor through the cavity or by contacting the ^ mers in the first level, four different monomers in the second 

surface of the slide in bulk. The receptor will preferentially level, and five different monomers in the third level in a 

bind to certain regions of the substrate which contain sUiped pattern. The output of the program is the number of 

complementary sequences, cells, the number of "stripes" (light regions) on each mask. 

Antibodies are typically suspended in what is commonly and the amount of translation required for each exposure of 

referred to as "supercocktail," which may be, for example, the mask. 



(I 
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TABLE 4 

Mask Strategy Program 

DEFLVTA-Z 

DLM b(20), w(20). 1(500) 

F$--LPT1:" 

OPEN fS FOR OUTPUT AS #1 

jmax - 3 'Number of residues 

b(l) - 3: b(2) - 4: b(3) - 5 'Number of buflding blocks for res 1, 2, 3 

g - 1: Imax(l) - 1 

PGR j - 1 TO jmax: g- g • hQ: NEXT j 
w(0)-0: w(l)-gA)(l) 

PRINT #1, -MASK2.BAS*', DATES, TIMES: PRINT «. 
PRINT #1, USING "Number of residues-*Mf *; jmax 
FOR j - 1 TO jmax 

PRINT #1, USING - Residue ## ## buUding blocks"; j; b(j) 

NEXTj 

PRINT #1,- 

PRINT #1, USING "Number of cclU-###^; g: PRINT #1. 

FOR j - 2 TO jmax 

ImaxG) - lmax(j - 1) • b(j - 1) 

w(3)-wG-l)/bO-) 

NEXTj 

FOR j - 1 TO jmax 

PRINT #1, USING "Mask for residue mT; j: PRINT #1, 
PRINT #1, USING " Number of 5lripcs-##/r; 1 max(j) 
PRINT #1, USING - Width of each 8tripc-<W#"; w(j) 
FOR 1 - 1 TO ImaxCi) 
a - 1 + (3 - 1) • w(i - 1) 
ae " a + wQ) - 1 

PRINT #1, USING " Stripe ## begins at location mt and ends at ###r'; 1; a; ac 

NEXTl 

PRINT #1, 

PRINT #1, USING " For each of ## buUding blocks, tiaoslatc mask by ## 
ccll(sr; bC); w(0, 

PRINT #1, : PRINT #1, : PRINT #1, 
NEXTj 

® Copyright 1990, Aflfymax N.V. 
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TABLES 



Masking Strategy Output 
Number of residues" 3 

Residue 1 3 building blocks 

Residue 2 4 building blocks 

Residue 3 5 building blocks 

Number of cells- 60 
Mask for residue 1 

Number of stripes- 1 
Width of each stripe- 20 
Stripe 1 begins at location 1 and ends at 20 
For each of 3 building blocks, translate mask by 20 cell(s) 
Mask Cor residue 2 

Niunber of str^ies- 3 
Width of each stripe- 5 
Stripe 1 begins at location 1 and ends at 5 
Stripe 2 begins at location 21 and ends at 25 
Stnfc 3 begins at location 41 and ends at 45 
For each of 4 building blocks, translate mask by 5 ccll(a) 
Mask for residue 3 

Number of stripes- 12 

Width of each stripe- 1 

Stripe 1 begins at location 1 and ends at 1 

Stripe 2 begins at location 6 and ends at 6 

Str^ 3 begins at location 11 and ends at 11 

Stripe 4 begins at location 16 and ends at 16 
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TABLE 5-coDtinued 



Masking Strategy Output 



Stripe S begins at location 21 and ends al 21 

Str^ 6 begins at location 26 and ends at 26 

Stripe 7 begins at locatioa 31 and ends at 31 

Str^ 8 begins at location 36 and ends al 36 

Stripe 9 begms at location 41 and ends at 41 

Str^ 10 begins at location 46 and ends at 46 

Stripe 11 begins at location 51 and ends at 51 

Stripe 12 begins at location 56 and cods at 56 

For each of 5 building blocks, translate mask by 1 cell(s) 



® Copyright 1990, Affymax N.V 

V. Details of One Embodiment of A Fluorescent 
Detection Device 

FIG. 9 illustrates a fluorescent detection device for detect- 
ing fluorescently labeled receptors on a substrate, A sub- 
strate 112 is placed on an x/y translation table 202. In a 
preferred embodiment the x/y translation table is a model no. 
PM500-A1 manufactured by Newport Corporation. The x/y 
translation table is connected to and controlled by an appro- 
priately programmed digital computer 204 which may be, 
for example, an appropriately programmed IBM PC/AT or 
AT compatible computer. Of course, other computer 
systems, special purpose hardware, or the like could readily 
be substituted for the AT computer used herein for illustra- 
tion. Computer software for the translation and data collec- 
tion functions described herein can be provided based on 
commercially available software including, for example, 
"Lab Windows" licensed by National Instruments, which is 
incorporated herein by reference for all purposes. 

The substrate and x/y translation table are placed under a 
microscope 206 which includes one or more objectives 208. 
Light Jabout 488 nm) from a laser 210, which in some 
embodiments is a model no. 2020-05 argon ion laser manu- 
factured by Spectraphysics, is directed at the substrate by a 
dichroic mirror 207 which passes greater than about 520 nm 
light but reflects 488 nm light. Dichroic mirror 207 may be, 
for example, a model no. FT510 manufactured by Carl 
Zeiss. Light reflected from the mirror then enters the micro- 
scope 206 which may be, for example, a model no. Axioscop 
20 manufactured by Carl Zeiss. Fluorescein-marked mate- 
rials on the substrate will fluoresce >488 nm light, and the 
fluoresced light will be collected by the microscope and 
passed through the mirror. The fluorescent light from the 
substrate is then directed through a wavelength filter 209 
and, thereafter through an aperture plate 211. Wavelength 
filter 209 may be, for example, a model no. OG530 manu- 
factured by Melles Griot and aperture plate 211 may be, for 
example, a model no, 477352/477380 manufactured by Carl . 
Zeiss. 

The fluoresced light then enters a photomultiplier tube 
212 which in some embodiments is a model no. R943-02 
manufactured by Hamamatsu, the signal is amplified in 
preamplifier 214 and photons are counted by photon counter 
216. Tht number of photons is recorded as a function of the 
location in the computer 204. Pre-Amp 214 may be, for 
example, a model no. SR440 manufacttired by Stanford 
Research Systems and photon counter 216 may be a model 
no. SR400 manufactured by Stanford Research Systems. 
The substrate is then moved to a subsequent location and the 
process is repeated. In preferred embodiments the data are 
acquired every 1 to 100 /an with a data collection diameter 
of about 0.8 to 10 /mi preferred. In embodiments with 
sufficiently high fluorescence, a CCD detector with broad- 
field illumination is utilized. 

By counting the number of photons generated in a given 
area in response to the laser, it is possible to determine where 



fluorescent marked molecules are located on the substrate. 
Consequently, for a slide which has a matrix of polypeptides, 
for example, synthesized on the surface thereof, it is possible 
to determine which of the polypeptides is complementary to 
a fluorescently marked receptor. 

According to preferred embodiments, the intensity and 
duration of the light applied to the substrate is controlled by 
varying the laser power and scan stage rate for improved 
signal-to-noise ratio by maximizing fluorescence emission 
and minimizing background noise. 

While the detection apparatus has been illustrated prima- 
rily herein with regard to the detection of marked receptors, 
the invention will find application in other areas. For 
example, the detection apparatus disclosed herein could be 
used in the fields of catalysis, DNA or protein gel scanning, 
^0 and the like. 

VI. Determination of Relative Binding Strength of 
Receptors 

35 The signal-to-noise ratio of the present invention is suf- 
ficiently high that not only can the presence or absence of a 
receptor on a ligand be detected, but also the relative binding 
afl5nity of receptors to a variety of sequences can be deter- 
mined. 

In practice it is found that a receptor will bind to several 
peptide sequences in an array, but will bind much more 
strongly to some sequences than others. Strong binding 
aflSnity will be evidenced herein by a strong fluorescent or 
radiographic signal since many receptor molecules will bind 
in a region of a strongly bound ligand. Conversely, a weak 
binding aflSnity will be evidenced by a weak fluorescent or 
radiographic signal due to the relatively small number of 
receptor molecules which bind in a particular region of a 
substrate having a ligand with a weak binding afiBnity for the 
receptor, consequently, it becomes possible to determine 
50 relative binding avidity (or affinity in the case of univalent 
interactions) of a ligand herein by way of the intensity of a 
fluorescent or radiographic signal in a region containing' that 
ligand. 

Semiquantitative data on affinities might also be obtained 
55 by varying washing conditions and concentrations of the 
receptor. This would be done by comparison to known 
ligand receptor pairs, for example. 

VII. Examples 

The following examples are provided to illustrate the 
efficacy of the inventions herein. All operations were con- 
ducted at about ambient temperatures and pressures unless 
indicated to the contrary. 
A. Slide Preparation 
65 Before attachment of reactive groups it is preferred to 
clean the substrate which is, in a preferred embodiment a 
glass substrate such as a microscope slide or cover slip. 



us 6,261,776 Bl 

23 24 

According to one embodiment the slide is soaked in an embodiments, several residues are sequentially added at one 

alkaline bath consisting of, for example, 1 liter of 95% location before moving on to the next location. Cycle times 

ethanol with 120 ml of water and 120 grams of sodium will generally be limited by the coupling reaction rate, now 

hydroxide for 12 hours. The slides are then washed under as short as 20 min in automated peptide synthesizers. This 

running water and allowed to air dry, and rinsed once with ^ step is optionally followed by addition of a protecting group 

a solution of 95% ethanol. to stabilize the array for later testing. For some types of 

The slides are then aminated with, for example, amino- polymers (e.g.. peptides), a final deprotection of the entire 

propyl triethoxysilane for the purpose of attaching amino surface (removal of photoprotective side chain groups) may 

groups to the glass surface on linker molecules, although any required. 

omega functionalized silanc could also be used for this More particularly, as shown in FIG. lOA, the glass 20 is 

purpose. In one embodiment 0.1% aminopropyltriethoxysi- provided with regions 22, 24, 26, 28. 30. 32, 34, and 36. 

lane is utilized, although solutions with concentrations from Regions 30. 32, 34, and 36 arc masked, as shown in FIG. 

10"''% to 10% may be used, with about 10^^% to 2% joB and the glass is irradiated and exposed to a reagent 

preferred. A 0.1% mixture is prepared by adding to 100 ml containing "A" (e.g., gly), with the resulting structure shown 

of a 95% ethanol/5% water mixture, 100 microliters Qd) of pjo. IQC. Thereafter, regions 22. 24, 26. and 28 are 
aminopropyltriethoxysilane. The mixture is agitated at about 15 masked, the glass is irradiated (as shown in FIG, lOD) and • 

ambient temperahjre on a rotary shaker for about 5 minutes. exposed to a reagent containing "B" (e.g., phe), with the 

500 //I of this mixture is then applied to the surface of one resulting structure shown in FIG. ICE. The process 

side of each cleaned slide. After 4 minutes, the slides are proceeds, consecutively masking and exposing the sections 

decanted of this solution and rinsed three times by dipping as shown until the structure shown in FIG. lOM is obtained, 

in, for example. 100% ethanol. o irradiated and the terminal groups are. 

After the plates dry, they are placed in a 110-120** C. optionally, capped by acetylation. As shown, all possible 

vacuum oven for about 20 minutes, and then allowed to cure irimers of gly/phe are obtained. 

at room temperature for about 12 hours in an argon envi- jq tbis example, no side chain protective group removal is 

ronment. The slides are then dipped into DMF necessary. If it is desired, side chain deprotection may be 

(dimethylformamide) solution, followed by a thorough accomplished by treatment with ethanedilhiol and trifluorx)- 

washing with methylene chloride. acetic acid. 

The aminated surface of the slide is then exposed to about general, the number of steps needed to obtain a 

500 /il of, for example, a 30 millimolar (mM) solution of particular polymer chain is defined by: 
NVOC-GABA (gamma amino butyric acid) NHS 
(N-hydroxysuccinimide) in DMF for attachment of a 

IWOC-GABA to each of the amino groups. ^0 nxl (i) 

The surface is washed with, for example. DMF, methyl- 
ene chloride, and ethanol. where: 

Any un re acted aminopropyl silane on the surface — that is, n«the number of monomers in the basis set of monomers, 

those amino groups which have not had the NVOC-GABA and 

attached— are now capped with acetyl groups (to prevent 35 i^the number of monomer units in a polymer chain, 

further reaction) by exposure to a 1:3 mixture of acetic Conversely, the synthesized number of sequences of 

anhydride in pyridine for 1 hour. Other materials which may leneth 1 will be: 
perform this residual capping function include trifluoroace- 
tic anhydride, formicacetic anhydride, or other reactive 

acylating agents. Finally, the slides are washed again with ^ ti, (2) 

DMF, methylene chloride, and ethanol. , 

B. Synthesis of Eight Trimers of "A" and "B" Of course, greater diversity is obtamed by using masking 

no. 10 illustrates a possible synthesis of the eight trimers strategies which uiU also include the synthesis of polymers 

of the two-monomer set: gly. phe (represented by "A" and ^^^^^^ ^ If ^^^i of less than 1. If, m the extreme case, all 

"B," respectively). A glass slide bearing silane groups ter- ^ polymers having a length less than or equal to 1 are 

minating in 6-nitroveratryloxycarboxamide (NVOC-NH) synthesized, the number of polymers synthesized will be: 

residues is prepared as a substrate. Active esters n'+n'"*+ . . . +n\ (3) 
(pentafluorophenyl, OBt, etc.) of gly and phe protected at the 

amino group with IWOC are prepared as reagents. While The maximum number of lithographic steps needed will 

not pertinent to this example, if side chain protecting groups generally be n for each "layer" of monomers, i.e., the total 

are required for the monomer set, these must not be photo- 50 number of masks (and, therefore, the number of lithographic 

reactive at the wavelength of light used to protect . the steps) needed will be nxl. The size of the transparent mask 

primary chain. regions will vary in accordance with the area of the substrate. 

For a monomer set of size n, nxl cycles arc required to available for synthesis and the number of sequences to be 

synthesize all possible sequences of length 1. A cycle con- formed. In general, the size of the synthesis areas will be: 

ststs of: 55 

4 . , . . . ( . t 1 size of svntfaesis arcas"(AV(S) 

1. Irradiation through an appropriate mask to expose the ' 

amino groups at the sites where the next residue is to be where: 

added, with appropriate washes to remove the A is the total area available for synthesis; and 

by-products of the deprotection. § number of sequences desired in the area. 

2. Addition of a single activated and protected (with the 60 jj appreciated by those of skfll in the art that the 
same photochemically-removable group) monomer. above method could readily be used to simultaneously 
which will rcact only at the sites addressed in step 1, produce thousands or millions of oligomers on a substrate 
with appropriate washes to remove the excess reagent using the photolithographic techniques disclosed herein, 
from the surface. Consequently, the method results in the ability to practically 

The above cycle is repeated for each member of the 65 test large numbers of, for example, di, tri, tetra, penta, hexa, 

monomer set until each location on the surface has been hepta, octapepttdes, dodecapeptides, or larger polypeptides 

extended by one residue in one embodiment. In other (or correspondingly, polynucleotides). 
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The above example has illustrated the method by way of 
a manual example. It will of course be appreciated that 
automated or semi-automated methods could be used. The 
substrate would be mounted in a fiow cell for automated 
addition and removal of reagents, to minimize the volume of 
reagents needed, and to more carefully control reaction 
conditions. Successive masks could be applied manually or 
automatically, 

C. Synthesis of a Dimer of an Aminopropyl Group and a 
Fluorescent Group 

In synthesizing the dimer of an aminopropyl group and a 
fluorescent group, a functionalized duraporc membrane was 
used as a substrate. The durapore membrane was a polyvi- 
nylidine difluoride with aminopropyl groups. The amino- 
propyl groups were protected with the DDZ group by 
reaction of the carbonyl chloride with the amino groups, a 
reaction readily known to those of skill in the art. The 
surface bearing these groups was placed in a solution of TUF 
and contacted with a mask bearing a checkerboard pattern of 
1 mm opaque and transparent regions. The mask was 
exposed to ultraviolet light having a wavelength down to at 
least about 280 nm for about 5 minutes at ambient 
temperature, although a wide range of exposure times and 
temperatures may be appropriate in various embodiments of 
the invention. For example, in one embodiment, an exposure 
time of between about 1 and 5000 seconds may be used at 
process temperatures of between -70 and +50° C. 

In one preferred embodiment, exposure times of between 
about 1 and 500 seconds at about ambient pressure are used. 
In some preferred embodiments, pressure above ambient is 
used to prevent evaporation. 

The surface of the membrane was then washed for about 
1 hour with a fluorescent label which included an active ester 
bound to a chelate of a lanthanide. Wash times will vary over 
a wide range of values from about a few minutes to a few 
hours. These materials fluoresce in the red and the green 
visible region. After the reaction with the active ester in the 
fiuorophore was complete, the locations in which the fluo- 
rophore was bound could be visualized by exposing them to 
ultraviolet light and observing the red and the green fluo- 
rescence. It was observed that the derivatized regions of the 
substrate closely corresponded to the original pattern of the 
mask. 

D. Demonstration of Signal Capability 

Signal detection capability was demonstrated using a 
low-level standard fluorescent bead kit manufactured by 
Flow Cytometry Standarda and having model no. 824. This 
kit includes 5.8 ;mi diameter beads, each impregnated with 
a known number of fluorescein molecules. 

One of the beads was placed in the illumination field on 
the scan stage as shown in FIG. 9 in a field of a laser spot 
which was initially shuttered. After being positioned in the 
illumination field, the photon detection equipment was 
turned on. The laser beam was unblocked and it interacted 
with the particle bead, which then fluoresced. Fluorescence 
curves of beads impregnated with 7,000; 13,000; and 29,000 
fluorescein molecules, are shown in FIGS. 11 A, UB, and 
lie respectively. On each curve, traces for beads without 
fluorescein molecules are also shown. These experiments 
were performed with 488 nm excitation, with 100 /<W of 
laser power. The light was focused through a 40 power 0.75 
NA objective. 

The fluorescence intensity in all cases started offal a high 
value and then decreased exponentially. The fall-off in 
intensity is due to photobleaching of the fluorescein mol- 
ecules. The traces of beads without fluorescein molecules 
are used for background subtraction. The difference in the 
initial exponential decay between labeled and nonlabeled 
beads is integrated to give the total number of photon counts, 
and this number is related to the number of molecules per 
bead. Therefore, it is possible to deduce the number of 
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photons per fluorescein molecule that can be detected. For 
the curves illustrated in FIG. 11, this calculation indicates 
the radiation of about 40 to 50 photons per fluorescein 
molecule are detected. 

E. Determination of the Number of Molecules Per Unit Area 
Aminopropylated glass microscope slides prepared 

according to the methods discussed above were utilized in 
order to establish the density of labeling of the slides. The 
free amino termini of the slides were reacted with FTTC 
(fluorescein isothiocyanate) which forms a covalent linkage 
30 with the amino group. The slide is then scanned to count the 
number of fluorescent photons generated in a region which, 
using the estimated 40-50 photons per fluorescent molecule, 
enables the calculation of the number of molecules which 
are on the surface per unit area. 

A slide with aminopropyl silane on its surface was 
immersed in a 1 mM solution of FITC in DMF for 1 hour at 
about ambient temperature. After reaction, the slide was 
washed twice with DMF and then washed with ethanol, 
water, and then ethanol again. It was then dried and stored 
in the dark until it was ready to be examined. 
20 Through the use of curves similar to those shown in FIG. 
11, and by integrating the fluorescent counts under the 
exponentially decaying signal, the number of free amino 
groups on the surface after derivitization was determined. It 
was detennined that slides with labeling densities of 1 
fluoroscein per 10^x10^ to -2x2 nm could be reproducibly 
made as the concentration of aminopropyl triethoxysilane 
varied from 10"^% to 10"^%. 

F. Removal of NVOC and Attachment of a Fluorescent 
Marker 

NVOC-GABA groups were attached as described above. 
30 The entire surface of one slide was exposed to light so as to 
expose a free amino group at the end of the gamma amino 
butyric acid. This slide, and a duplicate which was not 
exposed, were then exposed, to fluorescein isothiocyanate 
(FITQ. 

FIG. 12A illustrates the slide which was not exposed to 
light, but which was exposed to FITC. The units of the x axis 
are time and the units of the y axis are counts. The trace 
contains a certain amount of background fluorescence. The 
duplicate slide was exposed to 350 nm broadband illumi- 
nation for about 1 minute (12 mW/cm^, -350 nm 

^ illumination), washed and reacted with FITC. The fluores- 
cence curv^es for this slide are shown in RG. 12B. A large 
increase in the level of fluorescence is observed, which 
indicates photolysis has exposed a number of amino groups 
on the surface of the slides for attachment of a fluorescent 

45 marker. 

G. Use of a Mask in Removal of NVOC 

The next experiment was performed with a 0.1% amino- 
propylated slide. Light from a Hg — Xe arc lamp was imaged 
onto the substrate through a laser-ablated chrome-on-glass 
50 mask in direct contact with the substrate. 

This slide was illuminated for approximately 5 minutes, 
with 12 mW of 350 nm broadband light and then reacted 
with the 1 mM FITC solution. It was put on the laser 
detection scanning stage and a graph was plotted as a 
two-dimensional representation of position versus fluores- 
cence intensity. The fluorescence intensity (in counts) as a 
function of location is given on the scale to the right of FIG. 
13A for a mask having 100x100 ^m squares. 

The experiment was repeated a number of times through 
various masks. The fluorescence pattern for a 50 /on mask is 
60 illustrated in FIG. 13B, for a 20 /mi mask in FIG. 13C, and 
for a 10 fim mask in FIG. 13D. The mask pattern is distinct 
down to at least about 10 /on squares using this lithographic 
technique. 

H. Attachment of YGGFL and Subsequent Exposure to Herz 
65 Antibody and Goat Anti mouse 

In order to establish that receptors to a particular polypep- 
tide sequence would bind to a surface-bound peptide and be 
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detected. Leu enkephalin was coupled to the surface and and, therefore, there is antibody in the lighted regions for the 

recognized by an antibody. A slide was derivatized with fluorescein-oonjugated goat anti-mouse to recognize. 

0. 1% amino propyl-triethoxysilane and protected with Sinailar patterns are shown for a 50 mask used in 
NVOC. A 500 checkerboard mask was used to expose direct contact ("proximity print**) with the substrate in FIG. 
the slide in a flow cell using backside contact printing. The ^ 150. Note that the pattern is more distinct and the corners 
Leu enkephalin sequence (H2N-tyrosine,glycine,glycine, of the checkerboard pattern are touching when the mask is 
phenylalanine,leucine-C02H, otherwise referred to herein as in direct contact with the substrate (which reflects the 
YGGFL) was attached via its carboxy endto the exposed -^^^^^^ resolution using this technique). 

ammo m^on the surface of the slide^T^ep^^^^ was j Monomer-by-Monomcr Synthesis of YGGFL and PGGFL 

added m DMF solution with the BOP/HOBT/DIEA cou- . „,otK^c;c i.c.nn ^ <;n .,L nU^^nAr^A^-.rA ^\m\Ur tr^ 
pling reagents and recirculated through the flow cell for 2 10 ,^ ^ synthesis using a 50 /.m checkerboard ma^ similar to 

houre at room temperature shown in FIG. 15 was conducted. However, P was added 

A first antibody, Eiown as the Herz antibody, was appUed ^^f, ^GFL sites on the substrate through an additional 
to the surface of the slide for 45 minutes at 2 ^g/ml in a f ^"Plipg step. P was added by exposmg protected GGFL to 
supercocktaU (containing 1% BSA and 1% ovalbumin also ^^^^gj* ^'"^k, and subsequence exposure to P m the 
in this case). Asecond antibody, goat anti-mouse fluorescein , , ^^^^nner set forth abo>^^Therefore, half of the regions on the 
conjugate, was then added at 2/^g/ml in the supcrcocktail substrate contained YGGFL and the remaming half con- 
buffer, and allowed to incubate for 2 hours, tained PGGFL. , ^ ^. . . ^ . 

Jhc results of this experiment are provided in FIG. 14. ^, J^.^^"^^?^"^ P^^^ experiment is provided m 

Again, this figure illustrates fluorescence intensity as a TO. 16. As shown the regions are agam readily discerna^^^ 

function of posiUon. The fluorescence scale is shown on the ^his expenment demonstrates that antibodies arc able to 

right. This image was taken at 10 //m steps. This figure 20 recognize a specific sequence and that the recogniUon is not 

indicates that not only can deprotection be carried out in a length-dependent. our ^^r-r^ j 

weU defined pattern, but also that (1) the method provides K. Monomer-by-Monomer Synthesis of YGGFL and YPG- 
for successful coupling of peptides to the surface of the 

substrate, (2) the surface of a bound peptide is available for 1° order to further demonstrate the operability of the 

binding with an antibody, and (3) that the detection appa- 25 invention, a 50 /mi checkerboard pattern of alternating 

ratus capabilities are sufficient to detect binding of a reccp- YGGFL and YPGGFL was synthesized on a substrate using 

tQj. techniques like those set forth above. The resulting fluores- 

1. Monomer-by-Monomer Formation of YGGFL and Sub- cence plot is provided in FIG. 17. Again, it is seen that the 
sequent Exposure to Ubeled Antibody antibody is clearly able to recognize the YGGFL sequence 

Monomer-by-monomer synthesis of YGGFL and GGFL 30 does not bind significantly at the YPGGFL regions, 

in alternate squares was performed on a slide in a checker- Synthesis of an Array of Sixteen Different Ammo Acid 

board pattern and the resulting slide was exposed to the Herz Sequences and Estunation of RelaUve Bmdmg Affinity to 

antibody: This experiment and the results thereof are illus- Herz Antibody 

trated in FIGS. 15A, 15B, 15C, and 15D. Using techniques similar to those set forth above, an array 

In FIG. 15A, a slide is shown which is derivatized with 35 of 16 different amino acid sequences (replicated four times) 

the aminopropyl group, protected in this case with t-BOC was synthesized on each of two glass substrates. The 

(l-butoxycarbonyl). The slide was treated with TFA to sequences were synthesized by attaching the sequence 

remove the t-BOC protecting group. E-aminocaproic acid, NVOC-GFL across the entire surface of the slides. Using a 

which was t-BOC protected at its amino group, was then series of masks, two layers of amino acids were then 

coupled onto the aminopropyl groups. The aminocaproic selectively applied to the substrate. Each region had dimen- 

acid serves as a spacer between the aminopropyl group and sions of 0.25 cmxO.0625 cm. The first slide contained amino 

the peptide to be synthesized. The amino end of the spacer acid sequences containing only L amino acids while the 

was deprotected and coupled to NVOC-leucine. The entire second slide contained selected D amino acids. FIGS. 18A 

slide was then illuminated with 12 mW of 325 nm broad- and 18B iUustrate a map of the various regions on the first 

band illumination. The slide was then coupled with NVOC- and second slides, respectively. The patterns shown in FIGS, 

phenylalanine and washed. The entire slide was again 45 18A and 18B were duplicated four times on each slide. The 

illuminated, then coupled to NVOC-glycine and washed. slides were then exposed to the Herz antibody and 

The slide was again illuminated and coupled to NVOC- fluorescein-labeled goat anti-mouse, 

glycine to form the sequence shown in the last portion of FIG. 19 is a fluorescence plot of the first slide, which 

FIG. 15 A. contained only L amino acids. Red indicates strong binding 

As shown in FIG. 15B, alternating regions of the slide 50 (149,000 counts or more) while black indicates little or no 

were then illuminated using a projection print using a binding of the Herz antibody (20,000 counts or less). The 

500x500 /fln checkerboard mask; thus, the amino group of bottom right-hand portion of the slide appears "cut off" 

glycine was exposed only in the lighted areas. When the next because the slide was broken during processing. The 

coupling chemistry step was carried out, NVOC-tyrosine sequence YGGFL is clearly most strongly recognized. The 

was added, and it coupled only at those is spots which had sequences YAGFL and YSGFL also exhibit strong recogni- 

received illumination. The entire slide was then illuminated lion of the antibody. By contrast, most of the remaining 

to remove all the NVOC groups, leaving a checkerboard of sequences show little or no binding. The four duplicate 

YGGFL in the lighted areas and in the other areas, GGFL. portions of the slide are extremely consistent in the amount 

The Herz antibody (which recognizes the YGGFL, but not of binding shown therein, 

GGFL) was then added, followed by goat anti-mouse fluo- FIG. 20 is a fluorescence plot of the second slide. Again, 

rescein conjugate. strongest binding is exhibited by the YGGFL sequence. 

The resulting fluorescence scan is shown in FIG. 15C, and Significant binding is also detected to YaGFL, YsGFL, and 

the scale for the fluorescence intensity is again given on the YpGFL. The remaining sequences show less binding with 

right. Dark areas contain the lelrapeptide GGFL, which is the antibody. Note the low binding efiScicncy of the 

not recognized by the Herz antibody (and thus there is no sequence yGGFL. 

binding of the goat anti-mouse antibody with fluorescein 65 Table 6 lists the various sequences tested in order of 

conjugate), and in the red areas YGGFL is present. The relative fluorescence, which provides information regarding 

YGGFL pentapeptide is recognized by the Herz antibody relative binding affinity. 
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Apparent Binding to Herz Ab 
L^^. Set Set 



. YGGFL 


YGGFL 


YAGFL 


YaOFL 


YSGFL 


YsGFL 


LGGFL 


YpGFL 


FGGFL 


fGGFL 


YPGFL 


yGGFL 


LAGFL 


feGFL 


FAGFL 


WGGFL 


WGGFL 


yaGFL 




^FL 




WaGFL 
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VIII. Dlustrative Alternative Embodiment 

According to an alternative embodiment of the invention, 
the methods provide for attaching to the surface a caged 
binding member which in its caged form has a relatively low 
afiBnity for other potentially binding species, such as recep- 
tors and specific binding substances. Such techniques are 
more fully described in copending application Sen No. 
404,920, filed Sep. 8. 1989, and incorporated herein by 25 
reference for all purposes. 

According to this alternative embodiment, the invention 
provides methods for forming predefined regions on a 
surface of a solid support, wherein the predefined regions are 
capable of immobilizing receptors. The methods make use 
of caged binding members attached to the surface to enable 
selective activation of the predefined regions. The caged 
binding members are liberated to act as binding members 
ultimately capable of binding receptors upon selective acti- 
vation of the predefined regions. The activated binding 
members are then used to immobilize specific molecules 
such as receptors on the predefined region of the surface. 
The above procedure is repeated at the same or different sites 
on the surface so as to provide a surface prepared with a 
plurality of regions on the surface containing, for example, 
the same or different receptors. When receptors immobilized 40 
in this way have a differential afiSnity for one or more 
ligands, screenings and assays for the Ugands can be con- 
ducted in the regions of the surface containing the receptors. 

The alternative embodiment may make use of novel caged 
binding members attached to the substrate. Caged 
(unactivated) members have a relatively low affinity for 
receptors of substances that specifically bind to uncaged 
binding members when compared with the corresponding 
afiSnities of activated binding members. Thus, the binding 
members are protected from reaction until a suitable source 
of energy is applied to the regions of the surface desired to 
be activated. Upon application of a suitable energy source, 
the caging groups labilize, thereby presenting the activated 
binding member. A typical energy source will be light. 

Once the binding members on the surface are activated 
they may be attached to a receptor. The receptor chosen may 
be a monoclonal antibody, a nucleic acid sequence, a drug 
receptor, etc. The receptor will usually, though not always, 
be prepared so as to permit attaching it, directly or indirectly, 
to a binding member. For example, a specific binding 
substance having a strong binding afiBnity for the binding 
member and a strong afiEinity for the receptor or a conjugate 
of the receptor may be used to act as a bridge between 
binding members and receptors if desired. The method uses 
a receptor prepared such that the receptor retains its activity 
toward a particular ligand. 

Preferably, the caged binding member attached to the 
solid substrate will be a photoactivatable biotin complex. 
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i.e., a biotin molecule that has been chemically modified 
with photoactivatable protecting groups so that it has a 
significantly reduced binding afiEnity for avidin or avidin 
analogs than does natural biotin. In a preferred embodiment, 
the protecting groups localized in a predefined region of the 
surface will be removed upon application of a citable 
source of radiation to give binding members, that are biotin 
or a functionally analogous compound having substantially 
the same binding affinity for avidin or avidin analogs as does 
biotin. 

In another preferred embodiment, avidin or an avidin 
analog is incubated with activated binding members on the 
surface until the avidin binds strongly to the binding mem- 
bers. The avidin so immobilized on predefined regions of the 
surface can then be incubated with a desired receptor or 
conjugate of a desired receptor. The receptor will preferably 
be biotinylated, e.g., a biotinylated antibody, when avidin is 
immobilized on the predefined regions of the surface. 
Alternatively, a preferred embodiment will present an 
avidin/biotinylated receptor complex, which has been pre- 
viously prepared, to activated binding members on the 
surface. 

IX. Conclusion 

The present inventions provide greatly improved methods 
and apparatus for synthesis of polymers on substrates. It is 
to be understood that the above description is intended to be 
illustrative and not restrictive. Many embodiments will be 
apparent to those of skill in the art upon reviewing the above 
description. By way of example, the invention has been 
described primarily with reference to the use of photore- 
movable protective groups, but it will be readily recognized 
by those of skill in the art that sources of radiation other than 
light could also be used. For example, in some embodiments 
it may be desirable to use protective groups which are 
sensitive to electron beam irradiation, x-ray irradiation, in 
combination with electron beam lithograph, or x-ray lithog- 
raphy techniques. Alternatively, the group could be removed 
by exposure to an electric current. The scope of the invention 
should, therefore, be determined not with reference to the 
above description, but should instead be determined with 
reference to the appended claims, along with the full scope 
of equivalents to which such claims are entitled. 

What is claimed is: 

1. An array of oligonucleotides, the array comprising: 

a planar solid support having at least a first surface; and 
a plurality of different oligonucleotides attached to the 
first surface of the solid support at a density exceeding 
400 different oligonucleotides/cm^ wherein each of the 
different oligonucleotides is attached to the surface of 
the solid support in a different known location, and has 
a different determinable sequence. 

2. The array of claim 1, wherein each different oligo- 
nucleotides is from about 4 to about 20 nucleotides in length. 

3. The array of claim 1, wherein each different oligo- 
nucleotide is at least 12 nucleotides in length. 

4. The array of claim 1, wherein each different oligo- 
nucleotide is 2-100 nucleotides in length. 

5. The anay of claim 1, wherein the array comprises at 
least 1,000 different oligonucleotides attached to the first 
surface of the solid support. 

6. The array of claim 1, wherein the array comprises at 
least 10,000 different oligonucleotides attached to the first 
surface of the solid support. 

7. The array of claim 1, wherein each of the different 
known locations is physically separated from each other of 
the known locations. 

8. The array of claim 1, wherein said planar solid support 
is glass. 
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9. The array of claim 1, wherein said oligonucleotides are 
attached to the first surface of the solid support through a 
linker group. 

10. The anay of claim 1, wherein the oligonucleotide in 
the different known locations are at least 20% pure. 

U. The array of claim 1, wherein the oligonucleotides in 
the different known locations are at least 50% pure. 
. 12. The array of claim 1, wherein the oligonucleotide in 
the different known locations are at least 80% pure. 

13. The array of claim 1, wherein the oligonucleotide in 
the different known locations are at least 90% pure. 

14. The array of claim 1, wherein said array is produced 
by a binary synthesis process, said process comprising the 
steps of: 

providing a planar solid support, said solid support having 
a plurality of compounds immobilized on a surface 
thereof, said compounds having protecting groups 
coupled thereto; 

deprotecting a first portion of said plurality of compounds 
on said surface and not a second portion of said 
plurality of compounds; 

reacting said first portion of said plurality of compounds 
with a first component of said oligonucleotide; 

deprotecting at least a third portion of said plurality of 
compounds on said surface, said third portion compris- 
ing a fraction of said first portion of said plurality of 
compounds; 

reacting said at least third portion of said plurality of 
compounds with a second component of said oligo- 
nucleotide; and 

optionally repeating said binary synthesis steps to produce 
said oligonucleotide array. 

15. An array of nucleic acids, the array comprising: 
a planar support having at least a first surface; and 

a plurality of different nucleic acids attached to the first 
surface of the solid support at a density exceeding 400 
different nucleic acids/cm^, wherein each of the differ- 
ent nucleic acids is attached to the surface of the solid 
support in a different known location, has a different 
determinable sequence, wherein the different nucleic 
acids in the different known locations are at least 10% 
pure. 

16. The array of claim 15, wherein each different nucleic 
acid is at least 20 nucleotides in length. 

17. The array of claim 15, wherein the array comprises at 
least 1,000 different nucleic acids attached to the first surface 
of the solid support. 

18. The array of claim 15, wherein the array comprises at 
least 10,000 different nucleic acids attached to the first 
surface of the solid support. 

19. The array of claim 15, wherein each of the different 
known locations is physically separated firom each of the 
other known locations. 

20. The array of claim 15, wherein said planar solid 
support is glass. 

21. The array of claim 15, wherein said nucleic acids are 
attached to the first surface of the solid support through a 
linker group. 

22. The array of claim 15, wherein the nucleic acids in the 
different known locations comprise nucleic acids that are at 
least 20% pure. 

23. The array of claim 15, wherein the nucleic acid in the 
different known locations comprise nucleic acids that are at 
least 50% pure. 
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24. The array of claim 15, wherein the nucleic acids in the 
different known locations are at least 80% pure. 

25. The array of claim 15, the nucleic acids in the different 
known locations are at least 90% pure. 

26. The array of claim 15, wherein said array is produced 
by a binary synthesis process, said process comprising the 
steps of; 

providing a planar, solid support, said solid support hav- 
ing a plurality of compoimds immobilized on a surface 
thereof, said compounds having protecting groups 
coupled thereto; deprotecting a first portion of said 
plurality of compounds on said surface and not a 
second portion of said plurality of compounds; 

reacting said first portion of said plurality of compounds 
with a first reactant; 

deprotecting at least a third portion of said plurality of 
compounds on said surface, said third portion compris- 
ing a fraction of said first portion of said plurality of 
compounds; 

reacting said at least third portion of said plurality of 
compounds with a second reactant; and 

optionally repealing said binary synthesis steps to produce 
said nucleic acid array. 

27. The array of claim 15, wherein the nucleic acids are 
covalently attached to the support. 

28. An array of nucleic acids, the array comprising: 
a planar support having at least a first surface; and 

a plurality of different nucleic acids attached to the first 
surface of the solid support at a density exceeding 
10,000 different nucleic acids/cm^, wherein each of the 
different nucleic acids is attached to the surface of the 
solid support in a different known location, and has a 
different determinable sequence. 

29. An array of nucleic acids, the an-ay comprising: 
a planar support having at least a first surface; and 

a plurality of different nucleic acids attached to the first 
surface of the solid support at a density exceeding 400 
different nucleic acids/cm^, wherein each of the differ- 
ent nucleic acids is attached to the surface of the solid 
support in a different know location, has a different 
determinable sequence, wherein the surface and the 
support are made from different materials. 

30. The array of claim 15, wherein the different known 
locations are square in shape. 

31. The array of claim 15, wherein the substrate is glass. 

32. The array of claim 15, wherein the substrate is silicon 
dioxide. 

33. The array of claim .15, wherein the substrate is 
(poly)tetrafluoroethylene, (poly)vinylidenedifluoride, poly- 
styrene or polycarbonate. 

34. The method of claim 15, wherein the substrdte is 
optically transparent. 

35. The array of claim 15, where in the substrate is 
fiinctionalized with groups that attach to the plurality of 
different nucleic acids. 

36. The array of claim 1, wherein the pliu^ality of different 
oligontucleotides have known sequences. 

37. The array of claim 15, wherein the plurality of 
different nucleic acids have known sequences. 

38. The array of claim 28, wherein the pliu-ality of 
different nucleic acids have known sequences. 

39. The array of claim 29, wherein the pliu-ality of 
different nucleic acids have known sequences. 
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THE HUMAN GENOME 

A 2 91-biUion base pair (bp) consei^equence of the euchromatic portion of 
the human genome was generated by the whole-genome shotgun sequencing 
method. The 14.8-bilUon bp DNA sequence was generated over 9 nrionths from 
27 271.853 high-quality sequence reads (5.11-fold coverage of the genome) 
from b^th ends of plasmid clones made from the DNA of five indmduab. Two 
assembly strategies^ whole-genome assembly and a regional chromosorne 
assembly-were used, each combining sequence data from Celera and the 
publicly funded genome effort. The public data were shredded into 550-bp 
segments to create a 2.9-fold coverage of those genome regions that had been 
sequenced, without including biases inherent in the cloning and assembly 
procedure used by the publicly funded group. This brought the effective cov- 
erage in the assemblies to eightfold, reducing the number and size of gaps in 
the final assembly over what would be obtained with 5.11-fold coverage. The 
two assembly strategies yielded very similar results that largely agree with 
Spendent mappinldata. The assemblies effectively cover the euchromatic 
egfons of the human chromosomes. More than 90% o the genome is m 
scfffold assemblies of 100,000 bp or more, and 25% of the genome js .n 
scaffolds of 10 million bp or larger. Analysis of the genome sequence rev^ea ed 
26,588 protein-encoding transcripts for which there was strong "f^p^oratrng 
evidence andanadditional~12.000computationallyderivedgeneswihm^^ 
matches or other weak supporting evidence. Although gene-dense clustej-s are 
obvious, almost half the genes are dispersed in '<>^^'^\''^^'^''^l!^^, 
by large tracts of apparently noncoding sequence. Only 1.1% of the genome 
U spanned by exons. whereas 24% is in introns. with 75% of the genome being 
intergenic DNA. Duplications of segmental blocks, ranging in size up to ch o- 
mosomal lengths, are abundant throughout the genorrie and ^veal a comptex 
evolutionaiy history. Comparative genomic analysis indicates vertebrate ex- 
pansions of genes associated with neuronal function, with t'«ue-specific de- 
velopmental regulation, and with the hemostasis and ''"'""f^f,^"^;^^'^!^ 
sequence comparisons between the consensus sequence and P"M'cly funded 
genome data provided locations of 2.1 million single-nucleotide polymorphisms 
(SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 
1250 on average, but there was marked heterogeneity m the evel of poly- 
mbrphism acrofs the genome. Less than 1% of all SNPs resulted vana on " 
proteins, but the task of determining which SNPs have functional consequences 
remains an open challenge. 



Decoding of the DNA that constitutes the 
human genome has been widely anticipated 
for the contribution it will make toward un- 
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derstanding human evolution, the causation 
of disease, and the interplay between the 
environment and heredity in defining the hu- 
man condition. A project with the goal of 
determining the complete nucleotide se- 
quence of the human genome was first for- 
mally proposed in 1985 (/). In subsequent 
years, the idea met with mixed reactions m 
the scientific community (2). However, in 
1990, the Human Genome Project (HGP) was 
officially initiated in the United States under 
the direction of the National Institutes of 
Health and the U.S. Department of Energy 
with a 15-year, $3 billion plan for completing 
the genome sequence. In 1998 we announced 
our intention to build a unique genome- 
sequencing facility, to determine the se- 
quence of the human genome over a 3-year 
period. Here we report the penultimate mile- 
stone along the path toward that goal, a nearly 
complete sequence of the euchromatic por- 
tion of the human genome. The sequencing 
was performed by a whole-genome random 
shotgun method with subsequent assembly of 
the sequenced segments. 

The modem history of DNA sequencmg 
began in 1977, when Sanger reported his meth- 
od for determining the order of nucleotides of 



using chain-terminating nucleotide ana- 
(3). In the same year, the first human gene 
was isolated and sequenced (4). In 1986, Hood 
and co-workers (5) described an improvement 
in the Sanger sequencing method that included 
attaching fluorescent dyes to the nucleotides, 
which permitted them to be sequentially read 
by a computer. The first automated DNA se- 
quencer, developed by Applied Biosystenas in 
California in 1987, was shown to be successful 
when the sequences of two genes were obtained 
with this new technology (5). From early se- 
quencing of hunan genomic regions (7), it 
became clear that cDNA sequences (which are 
reverse-transcribed from RNA) would be es- 
sential to annotate and validate gene predictions 
in the human genome. These studies were the 
basis in part for the development of the ex- 
pressed sequence tag (EST) method of gene 
identification (5), which is a random selection, 
very high throughput sequencing approach to 
characterize cDNA Ubraries. The EST method 
led to the rapid discovery and moping of hu- 
man genes (P). The increasing numbers of hu- 
man EST sequences necessitated the develop- 
ment of new computer algorithms to analyze 
large amounts of sequence data, and in 1993 at 
The Institute for Genomic Research (TIGR), an 
algorithm was developed that permitted assem- 
bly and analysis of hundreds of thousands of 
ESTs. This algorithm permitted characteriza- 
tion and annotation of human genes on the basis 
of 30,000 EST assemblies {10). 

The complete 49-kbp bacteriophage lamb- 
da genome sequence was determined by a 
shotgun restriction digest metiiod in 1982 
{II), When considering methods for sequenc- 
ing the smallpox virus genome in 1991 (72), 
a whole-genome shotgun sequencing method 
was discussed and subsequently rejected ow- 
ing to the lack of appropriate software tools 
for genome assembly. However, in 1994, 
when a microbial genome-sequencing project 
was contemplated at TIGR, a whole-genome 
shotgun sequencing approach was considered 
possible with the TIGR EST assembly algo- 
rithm. In 1995, the 1.8-Mbp Haemophilus 
influenzae gGtiome vfBS completed by a 
whole-genome shotgun sequencing method 
{13), The experience with several subsequent 
genome-sequencing efforts, established the 
broad applicability of this approach (7^. 15). 

A key feature of the sequencing approach 
used for these megabase-size and larger ge- 
nomes was the use of paired-end sequences 
(also called mate pairs), derived from sub- 
clone libraries with distinct insert sizes and 
cloning characteristics. Paired-end sequences 
are sequences 500 to 600 bp in length from 
both ends of double-stranded DNA clones of 
prescribed lengths. The success of using end 
sequences from long segments (18 to 20 kbp) 
of DNA cloned into bacteriophage lambda in 
assembly of the microbial genomes led to the 
suggestion {16) of an approach to simulta- 
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neously map and sequence the human ge- 
nome by means of end sequences from 150- 
kbp bacterial artificial chromosomes (BACs) 
{17, 18), The end sequences spanned by 
known distances provide long-range continu- 
ity across the genome. A modification of the 
BAC end-sequencing (EES) method was ap- 
plied successfully to complete chromosome 2 
from the Arabidopsis thaliana genome {19), 
In 1997, Weber and Myers {20) proposed 
whole-genome shotgun sequencing of the 
hiunan genome. Their proposal was not well 
received {21), However, by early 1998, as 
less than 5% of the genome had been se- 
quenced, it was clear that the rate of progress 
in human genome sequencing worldwide 
was very slow {22% and the prospects for 
finishing the genome by the 2005 goal were 
imcertain. 

In early 1998, PE Biosystems (now Applied 
Biosystems) developed an automate4 high- 
tooughput capillary DNA sequencer, subse- 
quently called the ABI PRISM 3700 DNA 
Analyzer. Discussions between PE Biosystems 
and TIGR scientists resulted in a plan to under- 
take the sequencing of the himian genome with 
. the 3 700 DNA Analyzer and the whole-genome 
shotgun sequencing techniques developed at 
TIGR {23). Many of the principles of operation 
of a genome-sequencing facility were estab- 
, lished in the TIGR facility {24), However, the 
facility envisioned for Celera would liave a 
capacity roughly 50 times that of TIGR, and 
thus new developments were required for sam- 
ple preparation and tracking and for whole- 
genome assembly. Some argued that the re- 
quired 150-fold scale-up from the H. influenzae 
genome to the human genome with its complex 
repeat sequences was not feasible (25). The 
Drosophila melanogaster genome was thus 
chosen as a test case for whole-genome assem- 
bly on a large and covapltyi eukaiyotic genome. 
In collaboration with Gerald Rubin and the 
Berkeley Drosophila Genome Project, the nu- 
cleotide sequerice of the 120-Mbp euchromatic 
portion of the Drosophila genome was deter- 
mined over a 1-year period {26-28). The Dro- 
sophila genome-sequencing effort resulted in 
two key findings: (i) that the assembly algo- 
rithms could generate chromosome assemblies 
with highly accurate order and orientation with 
substantially less than 10-foId coverage, and (ii) 
that undertaking multiple interim assemblies in 
place of one con^rehensive final assembly was 
not of value. 

These findings, together with ie dramatic 
changes in the public genome efifort subsequent 
to the formation of Celera {29), led to a modi- 
fied whole-genome shotgun sequencing aj^y- 
proach to the human genome. We initially pro- 
posed to do 10-fold sequence coverage of the 
genome over a 3-year period and to make in- 
terim assembled sequence data available quar- 
terly. The modifications included a plan to per- 
fomi random shotgun sequencing to -'5-fold 
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coverage and to use the unordered and unori- 
ented BAC sequence fragments and subassem- 
blies published in GenBank by the publicly 
funded genome effort {30) to accelerate the 
project We also abandoned the quarteriy an- 
nouncements in the absence of interim assem- 
blies to report 

- Although this strategy provided a reason- 
able result very early that was consistent with a 
viliole-genome shotgun assembly with eight- 
fold coverage, the human genome sequence is 
not as finished as the Drosophila genome was 
with an effective 13-fold coverage. However, it 
became clear that even with this reduced cov- 
erage strategy, Celera could generate an accu- 
rately ordered and oriented scaffold sequence of 
the human genome in less than 1 year. Human 
genome sequencing was initiated 8 September 
1999 and completed 17 June 2000. The first 
assembly was completed 25 June 2000, and the 
asseriibly reported here was completed 1 Octo- 
ber 2000. Here we describe the whole-genome 
random shotgun sequencing effort applied to 
the human genome. We developed two differ- 
ent assembly approaches for assembling the ^3 
billion bp that riake up the 23 pairs of chromo- 
- somes of the Homo sapiens genome. Any Gen- 
Bank-derived data were shredded to remove 
potential bias to the final sequence fiom chi- 
meric clones, foreign DNA contamination, or 
misassembied xontigs. Insofar as a correctly - - 
and accurately - assembled genome sequence 
with faithful order and orieritation of contigs 
is essential for an accurate analysis of the 
human genetic code, we have devoted a con- 
siderable portion of this manuscript to the 
docimientation of the quality of our recon- 
struction of the genome. We also describe our 
preliminary analysis of the human genetic 
code on the basis of computational methods. 
Figure 1 (see fold-out chart associated with 
this issue; files for each chromosome can be 
found in Web fig. 1 on Science Online at 
www.sciencemag.org/cgi/content/full/291/ 
5507/1 304/DCl) provides a graphical over- 
view of the genome and the features encoded 
in it. The detailed manual curation and inter- 
pretation of the genome are just beginning. 

To aid the reader in locating specific an- 
alytical sections, we have divided the paper 
into seven broad sections. A summary of the 
major results appears at the begirming of each 
section. 

1 . Sources of DNA and Sequencing Methods 

2 Genome Assembly Strategy and 
Characterization 

3 Gene Prediction and Armotation 

4 Genome Structure 

5 Genome Evolution 

6 A Genome-Wide Examination of 
Sequence Variations 

7 An Overview of the Predicted Protein- 
Coding Genes in the Human Genome 

S Conclusions 



1 Sources of DNA and Sequencinp 
Methods * 

Summary, This section discusses the rntionau 
and ethical rules governing donor scicciiun to 
ensure ethnic and gender diversity along uith 
-.the methodologies for DNA extraction and U. 
brary constmction. The plasmid library corv. 
stmction is the first critical step in shotgun 
sequencing. If the DNA libraries are not uni. 
. form in size, nonchimeric, and do not randomly 
represent the genome, then the subsequent stcpj 

. cannot accurately reconstruct the genome se- 
quence. We used automated high-throui;hput 
DNA sequencirig and the computational infra: 
structure to enable efficient tracking of enor- 
mous amounts of sequence information (27.3 
million sequence reads; 14.9 billion bp of se- 
quence). Sequencing and tracking from both 
ends of plasmid clones from 2-, 10-, and 50-kbp 
hl)raries were essential to the computational 
reconstmction of the genome. Our evidence 

. indicates that the accurate pairing rate of end 

. sequences was greater than 98%. 

Various policies of the United States and the 
eWorld Medical Association, specifically ilic 
Declaration of Helsinki, offer recommenda- 
tions for conducting experiments with human 
subjects. We convened an Instimtional Rc- 
, .view Board (IRB) {31) that helped us estab- 
lish the protocol for obtaining and using hu- 
man DNA and the informed consent process 
used to enroll research volunteers for the 
DNA-sequencing studies reported here. We 
adopted several steps and procedures to prt>- 
tect the privacy rights and confidentiality of 
the research subjects (donors). These includ- 
ed a two-stage consent process, a secure ran- 
dom alphanumeric coding system for speci- 
. mens and records, circumscribed contact with 
the subjects by researchers, and options for 
off-site contact of donors. In addition, Celera 
applied for and received a Certificate of Con- 
fidentiality from the Department of Health 
and Human Services. This Certificate autho- 
rized Celera to protect the privacy of the 
individuals who volunteered to be donors as 
provided in Section 301(d) of the Public 
Health Service Act 42 U.S.C. 241(d). 

Celera and the IRB believed that the ini- 
tial version of a completed human genome 
should be a composite derived from multiple 
donors of diverse ethnic backgrounds Pro- 
spective donors were asked, on a voluntary 
basis, to self-designate an ethnogeographic 
category (e.g., African- American, Chinese, 
Hispanic, Caucasian, etc.). We enrolled 21 
donors (52). 

Three basic items of information from 
each donor were recorded and linked by con- 
fidential code to the donated sample: age, 
sex, and self-designated ethnogeographic 
group. From females, -130 ml of whole, 
heparinized blood was collected. From males, 
-130 ml of whole, heparinized blood was 
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collected, as well as five specimens of senr^^ 
collected over a 6-week period PennanS^ 
lymphoblastoid cell lines were created by 
Epstein-Barr vims immortalization. DNA 
fiom five subjects was selected for genomic 
DNA sequencing: two males and three fe- 
jnales — one African-American, one Asian- 
Chinese, one Hispanic-Mexican, and two 
Caucasians (see Web fig. 2 on Science Online 
at www.sciencemag.org/cgi/content/291/5507/ 
1304/DCl). The decision of vAiose. DNA to * 
sequence was based on a complex mix of fac- . 
. tors, including the goal of achieving diversity as 
well as technical issues such as the quality of 
the DNA libraries and availability of immortal- 
ized cell lines. 

1.1 Library construction and 
sequencing 

Central to the whole-genome shotgun sequenc- 
ing process is preparation of high-quality plas- 
mid libraries in a variety of insert sizes so that 
paiis of sequence reads (mates) are obtained, 
one read from both ends of each plasmid insert 
High-quality libraries have an equal representa- 
tion of all parts of the genome, a small number 
of clones without inserts, and no contamination 
from such sources as the mitochondrial genome 
and Escherichia coli genomic DNA. DNA from 
each donor was used to construct plasmid librar- 
ies in one or more of three size classes: 2 kbp, 1 0 . 
kbp, and 50 kbp (Table 1) (35), 

In designing the DNA-sequencing pro- 
cess, we focused on developing a simple 
• system that could be implemented in a robust - 
and reproducible manner and monitored ef- 
fectively (Fig. 2) {34), 

Current sequencing protocols are based on 
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the dideoxy sequencing method {35), which 
typicaUy yields only 500 to 750 bp of sequence 
per reaction. This limitation on read length has 
made monumental gains in throughput a pre- 
requisite for the analysis of large eukaryotic 
genomes. We accomplished this at the Celera 
facility, which occiq)ies about 30,000 square 
feet of laboratory space and produces sequence 
data continuously at a rate of 175,000 total 
reads per day. The DNA-sequencing facility is 
,supported by a high-performance computation- 
al facility (id). 

- • The process for DNA sequencing was niod- 
ular by design and automated. Intermodule 
sample backlogs allowed four principal 
modules to operate independently: (i) li- 
brary transformation, plating, and colony 
picking; (ii) DNA template preparation; 
(iii) dideoxy sequencing reaction set-up 
and purification; and (iv) sequence deter- 
mination with the ABI PRISM 3700 DNA 
Analyzer. Because the inputs and outputs 
of each module have been carefiilly 
matched and sample backlogs are continu- 
ously managed, sequencing has proceeded 
without a single day's interruption since the 
initiation of the Drosophila project in May 
1999. The 'ABI 3700 is a fully automated 
capillary array sequencer and as such can 
be operated with a minimal amount of 
hands-on time, currently estimated at about 
15 min per day. The capillary system also 
facilitates correct associations of sequenc- 
ing traces with samples through the elimi- 
nation of manual sample loading and lane- 
tracking errors associated with slab gels. 
About 65 production staff were hired and 
trained, and were rotated on a regular basis 



^^ugh the four production modules. A 
^B&al laboratory information management 
system (LIMS) tracked all sample plates by 
unique bar code identifiers. The facility was 
supported by a quality control team that per- 
formed raw material and in-process testing 
and a quality assurance group with responsi- 
bilities including document control, valida- 
tion, and auditing of the facility. Critical to 
the success of the scale-up was tfie validation 
of all software and instrumentation before 
implementation, and production-scale testing 
of any process changes. * 

1.2 Trace processing 

An automated trace-processing pipeline has 
been developed to process each sequence file 
(57). After quality and vector trimming, the 
average trimmed sequence length was 543 
bp, and the sequencing accuracy was expo- 
nentially distributed with a mean of 99.5% 
and with less than 1 in 1000 reads being less 
than 98% accurate {26), Each trinuned se- 
quence was screened for matches to contam- 
inants including sequences of vector alone, E. 
coli genomic DNA, and hiunan mitochondri- 
al DNA. The entire read for any sequence 
with a significant match to a contaminant was 
discarded. A total of 713 reads matched E. 
coli genomic DNA and 2114 reads matched 
the human mitochondrial genome. 

1.3 Quality assessment and control 

The importance of the base-pair level ac- 
• curacy of the sequence data increases as the 
size and repetitive nature of the genome to 
be sequenced increases. Each sequence 
read must be placed uniquely in the ge- 



Tabte 1. Celera-generated data input into assembly. 



No. of sequendng reads 



Fold sequence. coverage 
(2.9-Gb genome) 



Fold done coverage 



Insert size* (mean) 
Insert size* (SD) 
% Matest 



Number of reads for different insert libraries 



Individual 



2 kbp 



10 kbp 



A 


0 


0 


B 


11.736.757 


7.467,755 


C 


853.819 


881,290 


D 


952.523 


1.046.815 


F 


0 


1,498.607 


Total 


13,543.099 


10.894.467 


A 


0 


. 0 


B 


. 2.20 


. 1.40 


C 


0.16 


1.17 


D 


0.18 


0.20 


F 


0 


. 0.28 


Total 


2.54 


2.04 


A 


0 


0 


B 


2.96 


1126 


C 


022 


133 


D 


0.24 


1.58 


F 


0 


226 


Total 


3.42 


16.43 


Average 


1.951 bp 


10.800 bp 


Average 


6.10% 


8.10% 


Average 


74.50 


80.80 



50 kbp 



2.767.357 
66.930 
0 
0 
0 

2.834.287 
0.52 
0.01 
0 
0 
0 

0.53 
1839 

0.44 
0 
0 
0 

18.84 
50.715 bp 
14.90% 
75.60 



Total 



2.767357 
19.271.442 
1.735.109 
1.999,338 
1.498.607 
27.271.853 
0.52 
3.61 
0.32 
037 
0.28 
5.11 
18.39 
14.67 
1.54 
1.82 
226 
38.68 



Total number of 
base pairs 



1.502,674.851 
10.464.393.006 
942.164.187 
1.085.640,534 
813.743.601 
14.808.616.179 



•Insert she and SO are calculated from assembly of mates on contigs. t% Mates is based on Uboratory tracking of sequencing runs. 
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nome, and even a modest error rate can 
reduce the effectiveness of assembly. In 
addition, maintaining the validity of mate- 
pair information is absolutely critical for 
the algorithms described below. Procedural 
controls were established for maintaining 
the validity of sequence mate-pairs as se- 
quencing reactions proceeded through the 
process, including strict rules built into the 
LIMS. The accuracy of sequence data pro- 
duced by the Celera process was validated 
in the course of the Drosophila genome 
project {26). By collecting data for the 



entire human genome in a single facility, 
we were able to ensure uniform quality 
standards and the cost advantages associat- 
ed with automation, an economy of scale, 
and process consistency. 

2 Genome Assennbly Strategy and 
Characterization 

Summary. We describe in this section the two 
approaches that we used to assemble the ge- 
nome. One method involves the computational 
combination of all sequence reads with shred- 
ded data from GenBank to generate an indepen- 



dent, nonbiased view of the genome. The sec- 
ond approach involves clustering all of the frag- 
ments to a region or chromosome on the basis 
of mapping information. The clustered data 
were then shredded and subjected to compiita- 
tional assembly. Both approaches provided es- 
sentially the. same reconstruction of assembled 
DNA sequence* with proper order and orienta- 
.tion. "The. second ' method provided slightly 
' greater sequence coverage (fewer gaps) and 
was the principal sequence used for the analysis 
phase. In addition, we document the complete- 
ness and correctness of this assembly process 
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Fig. 2. Flow diagram for sequencing pipeline. Samples are received, 
selected, and processed in compliance with standard operating proce- 
dures, with a focus on quality within and across departments. Each 
process has defined inputs and outputs with the capability to exchange 



samples and data with both internal and external entities according to 
defined quality guidelines. Manufacturing pipeline processes, products, 
quality control measures, and responsible parties are indicated and are 
described further in the text 
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and provide a comparison to the public Assembly data sets 

- - * by^^We used two independent sets of data for our 



sequence, which was reconstructed largely 
an independent BAC-by-BAC approach. Our 
assemblies efifectively covered the euchromatic 
regions of tfie human chromosomes. More than 
90% of the genome was in scaffold assemblies 
of 100,000 bp or greater, and 25% of the ge- 
nome was in scaffolds of 10 miUion bp or . 
larger. 

Shotgun sequence assembly is a classic 
example of an inverse problem: given a set 
of reads randomly sampled from a target 
sequence, reconstruct the order and the po- 
sition of those reads in the target. Genome 
assembly algorithms developed for Dro- 
sophila have now been extended to assemble 
the "-25-fold larger human genome. Celera as- 
semblies consist of a set of contigs that are 
ordered and oriented into scaffolds that are then 
mapped to chromosomal locations by using 
known markers. The contigs consist of a col- 
lection of overlapping sequence reads that pro- 
vide a consensus reconstruction for a contigu- 
ous interval of the genome. Mate pairs are a 
central component of the assembly strategy. 
They are used to produce scaffolds in which Ae 
size of gaps between consecutive contigs is 
known witii reasonable precisiorL This is ac- 
complished by observing that a pair of reads, 
one of which is in one contig, and the other of 
which is in another, implies an orientation and 
distance between the two contigs (Fig. 3). Fi- 
nally, our assemblies did not incorporate all 
: reads into the final set of reported scaffolds. 
This set of unincorporated reads, is termed 
"chaff." and typically consisted of reads from 
within highly repetitive regions, data from other 
organisms introduced through various routes as 
found in many genome projects, and data of 
poor quality or with untrinmied vector. 



assemblies. The first was a random shotgun 
data set of 2727 million reads of average length 
543 bp produced at Celera. This consisted 
largely of mate-pair reads from 16 libraries 
constructed from DNA san:^)les taken from five 
different donors. Libraries with insert sizes of 2, 
10, and 50 kbp were used. By looking at how 
mate pairs from a Ubrary were positioned in 
known sequenced stretches of the genome, we 
were able to characterize the range of insert 
■ sizes in each library and determine a mean and 
standard deviation. Table I details the number 
of reads, sequencing coverage, and clone cov- 
erage achieved by the data set The clone cov- 
erage is the coverage of the genome in cloned 
DNA, considering the entire insert of each 
clone that has sequence from both ends. The 
clone coverage provides a measure of the 
amount of physical DNA coverage of the ge- 
nome. Assuming a genome size of 2.9 Gbp, the 
Celera trinuned sequences gave a 5.1 X cover- 
age of the genome, and clone coverage was 
3.42X, 16.40X, and 18.84X for the 2-, 10-, and 
50-kbp hbraries, respectively, for a total of 
38.7X clone coverage. 

The second data set was from the publicly 
funded Human Genome Project (PEP) and is 
primarily derived from BAC clones {3U). The 
BAC data input to the assemblies came from a 
download of GenBank on 1 September 2000 
(Table 2) totaling 4443.3 Mbp of sequence. 
The data for each BAC is deposited at one of 
four levels of completion. Phase 0 data are a set , 
of generally unassembled sequencing reads' 
from a very ligjit shotgun of the BAC, typically 
less than IX. Phase 1 data are unordered as- 
semblies of contigs, which we call BAC contigs 
or bactigs. Phase 2 data are ordered assemblies 
of bactigs. Phase 3 data are complete BAC 
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Fig. 3. Anatomy of whole-genome assembly. Overlapping shredded bactig fragments (red lines) and 
intemaUy derived reads from five different individuals (black Unes) are combined to produce a 
contig and a consensus sequence (green Une). Contigs are connected into s^^oi^ (^ed) by usrng 
mate pair information. Scaffolds are then mapped to the genome (gray Une) with STS (blue star) 
physical map information. 



^^ices. In the past 2 years the PFP has 
I^Pbd on a product of lower quality and com- 
pleteness, but on a fester time-course, by con- 
centrating on the production of Phase 1 data 
from a 3X to 4X light-shotgun of each BAC 
clone. 

We screened the bactig sequences for con- 
taminants by using the BLAST algorithm 
against three data sets: (i) vector sequences 
in Univec core {38), filtered for a 25-bp 
match at 98% sequence identity at the ends 
of the sequence and a 30-bp match internal 
to the sequence; (ii) the nonhuman portion, 
of the High Throughput Genomic (HTG) 
Seqences division of GenBank (39), fil- 
tered at 200 bp at 98%; and (iii) the non- 
redundant nucleotide sequences from Gen- 
Bank without primate and human virus en- 
tries, filtered at 200 bp at 98%. Whenever 
25 bp or more of vector was found within 
50 bp of the end of a contig, the tip up to 
the matching vector was excised. Under 
these criteria we removed 2.6 Mbp of pos- 
sible contaminant and vector from the 
Phase 3 data, 61.0 Mbp from the Phase 1 
and 2 data, and 16.1 Mbp from the Phase 0 
data (Table 2). This left us with a total of 
4363.7 Mbp of PFP sequence data 20% 
finished, 75% rough-draft (Phase 1 and 2). 
and 5% single sequencing reads (Phase 0). 
An additional 104,018 BAC end-sequence 
mate pairs were also downloaded and in- 
cluded in the data sets for both assembly 
processes {18), 

2.2 Assembly strategies 

Two different approaches to assembly were 
pursued. The first was a whole-genome as- 
sembly process that used Celera data and the 
PFP data in the form of additional synthetic 
shotgun data, and the second was a compart- 
mentalized assembly process that first parti- 
tioned the Celera and PFP data into sets 
localized to large chromosomal segments and 
then performed ab initio shotgun assembly on 
each set. Figure 4 gives a schematic of the 
overall process flow. 

For the whole-genome assembly, die PFP 
data was first disassembled or "shredded" into a 
synthetic shotgun data set of 550-bp reads that 
form a perfect 2 X covering o£the bactigs. This 
resulted in 1 6.05 million "feux" reads diat were 
sufficient to cover the genome 2.96X because 
of redundancy in the BAC data set, without 
incorporating the biases inherent in the PFP 
assembly process. The combined data set of 
43.32 million reads (8X), and all associated 
mate-pair information, were tiien subjected to 
our whole-genome assembly algoridim to pro- 
duce a reconstruction of the genome. Neidier 
the location of a BAC in the genome nor its 
assembly of bactigs was used in this process. 
Bactigs were shredded into reads because we 
found strong evidence that 2.13% of them were 
misassembled {40). Furdiermore, BAC location 
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infonnation was ignored because some BACs 
were not coirectly placed on the PFP physical 
map and because we found strong evidence that 

Table 2. CenBank data input into assembly. 
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at least 22% of the BACs contained sequence 
data that were not part of tiie given BAC (41), 
possibly as a result of sanq)le-tracking errors 



Completion phase sequence 



Center 



Statistics 
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Genome Research. 
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•Other centers contributing at least 0.1% of the sequence include: Chinese National Human Genome Center 
Cenomanalyse Ccsellschaft fuer Biotechnologische Forschung mbH; Genome Therapeutics Corporation; GENOSCOPE; 
Chinese Academy of Sciences; Institute of MolecuUr Biotechnology; Keio University School of Medidne; Lawrence 
Uvermore National Laboratory; Cold Spring Harbor Laboratory Los Alamos National Laboratory Max-PUndc Institut fuer 
Molekulare. Genetik; Japan Science and Technology Corporation; Stanford University; The Institute for Ceiwmic 
Research; The Institute of Physical and Chemical Research. Gene Bank; The University of Oklahoma; Unh^rsity of Texas 
Southwestern Medical Center, University of Washingtoa fThe 4.405.700325 bases contributed by aU centers were 
shredded Into faux reads resulting In 236X coverage of the genome. 



(see below). In short, we perfonned a true, ab 
initio whole-genome assembly in which *ut 
took the expedient of deriving additional se- 
quence coverage, but not mate pairs, assembled 
bactigs, or genome locality, from. .somc cxicr- 
nally generated data. 

' hi the compartmentalized shotgun assembly 
(CSA), Celera and PFP data were partitioned 
into the largest possible chromosomal segmcnti 
or "components'** that could be detennined with 
confidence, and then shotgun assembly was ap- 
plied to each partitioned subset wherein the 
bactig data were again shredded into faux rcadi 
to ensure an independent ab initio assembly of 
the component. By subsetting the data in this 
way, the overall computational efifort was re* 
duced and the effect of interchromosomal dupli- 
cations was amehorated This also resulted in a 
reconstruction of the genome that was relatively 
independent of the whole-genome assembly re- 
sults so that the two assemblies could be com- 
pared for consistency. The quality of the parti- 
tioning into .con^x)nents .was .crucial so tluit 
different genome regions were not mixed lev 
gether. We constructed components from (i ) ilic 
longest scaffolds of the sequence from catli 
BAC and (ii) assembled scaffolds of data unique 
to Celera's data set. The BAC assemblies were 
obtained by a combining assembler that used the 
bactigs and die 5 X Celera data mapped to these 

. bactigs as input This efifort was undertaken as 
an interim step solely because the more accurate 

. and complete the scaffold for a given sequence 
stretch, the more accurately one can tile these 
scaffolds into contiguous components on the 
basis of sequence overlap and mate-pair infiH - 
mation. We further visually inspected and cu- 
rated the scaffold tiling of the components to 
further increase its accuracy. For the final CSA 
assembly, all but the partitioning was ignored, 
and an independent, ab initio reconstruction of 
the sequence in each component was obtained 
by ^plying our whole-genome assembly algo- 
rithm to the partitioned, relevant Celera data and 
the shredded, faux reads of the partitioned, a-l- 
evant bactig data. 

2.3 Whole-genome assembly 

The algorithms used for whole-gcnomc ns- 
sembly (WGA) of the human genome were 
enhancements to those used to produce tlic 
sequence of the Drosophila genome rcportca 
in detail in (28). . 

The WGA assembler consists of a pipci'"^ 
comjwsed of five principal stages: Scrccncr. 
Overiapper. Unitigger, Scaffolds, and lU-jx-' 
Resolver, respectively. The Screcncr fintis 
and marics all microsatelUte repeats with i^^. 
than a 6-bp element, and screens out u 
known interspersed repeat elements, mclu - 
ing Alu, Line, and ribosomal DNA. M"rkc:(i 
regions get searched for overlaps, whcrc^«_ 
screened regions do not get searched, but ci 
be part of an overlap that involves unscrccnt 
matching segments. 
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The Overlapper compares every real 
against every other read in search of complete 
end-to-end overlaps of at least 40 bp and with 
no more than 6% differences in the match. 
Because all data are scrupulously vector- 
trimmed, the Overlapper can insist on com- 
plete overlap matches. Computing the set of 
all overlaps took roughly 10.000 CPU hours 
with a suite of four-processor Alpha SMPs 
with 4 gigabytes of RAM. This took 4 to 5 
; days in elapsed time with 40 such machines, 
operating in parallel. 

Every overlap computed above is statisti- 
cally a 1-in-lO'^ event and thus not a coinci- 
dental event. What makes assembly combi- 
natorially difficult is that while many over- 
laps are actually sampled from overlapping 
regions of the genome, and thus imply that 
the sequence reads should be assembled to- 
getiier, even more overlaps are actually from 
two distinct copies of a low-copy repeated 
element not screened above, thus constituting 
an error if put together. We call the former 
"true overlaps" and the latter "repeat-induced 
overlaps." The assembler must avoid choos- 
, ing repeat-induced overlaps, especially early 
in the process. 

We achieve this objective in the Unitig- 
ger. We first find all assemblies of reads that 
appear to be uncontested with respect to all 
other reads. We call the contigs formed from 
these subassemblies unitigs (for uniquely as- 
sembled con tigs) . Formally, these unitigs are 
the uncontested interval subgraphs of . the 
graph of all overlaps (42). Unfortunately, al^ 
though empirically many of these assemblies 
are correct (and thus involve only true over- 
laps), some are in fact collections of reads 
from several copies of a repetitive element 
that have been overcollapsed into a single 
subassembly. However, the overcollapsed 
imitigs are easily identified because their av- 
erage coverage depth is too high to be con- 
sistent with the overall level of sequence 
coverage. We developed a simple statistical 
discriminator that gives the logarithm of the 
odds ratio that a unitig is composed of unique 
DNA or of a repeat consisting of two or more 
copies. The discriminator, set to a sufficiently 
stringent threshold, identifies a subset of the 
unitigs that we are certain are correct In 
addition, a second, less stringent threshold 
identifies a subset of remaining unitigs very 
likely to be conrectly assembled, of which we 
select those that will consistently scaffold 
(see below), and thus are again ahnost certain 
to be correct. We call the union of these two 
sets U-unitigs. Empirically, we found from a 
6X simulated shotgun of human chromosome 
22 that we get U-unitigs covering 98% of the 
stretches of unique DNA that are >2 kbp 
long. We are flirther able to identify the 
boundary of the start of a repetitive element 
at the ends of a U-unitig and leverage this so 
that U-unitigs span more than 93% of all 
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singly interspersed Alu elements and other 
100-to 400-bp repetitive segments. 

The result of running the Unitigger was 
thus a set of correctly assembled subcontigs 
covering an estimated 73.6% of the human 
genome. The Scaffolder then proceeded to 
use mate-pair information to link these to- 
gether "into scaffolds. When there are two or 
more mate pairs that imply that a given pair 
of U-unitigs are at a certain distance and 
orientation, with respect to each other, the 
probability of this being wrong is again 
roughly 1 in 10*°, assuming that mate pairs ■ 
are false less than 2% of the time. Thus, one 
can with high confidence link together all 
U-unitigs that are linked by at least two 2- or 
10-kbp mate pairs producing intermediate- 
sized scaffolds that are then recursively 
linked together by confirming 50-kbp mate . • 
pairs and BAC end sequences. This process 
yielded scaffolds that are on the order of 
megabase pairs in size with gaps between 
their contigs that generally correspond to re- • 
petitive elements and occasionally to small . 
sequencing gaps. These scaffolds reconstruct 
the majority of the unique sequence within a : 
genome. 

For the Drosophila assembly, we engaged . 
in a three-stage repeat resolution strategy 
where each, stage was progressively more 
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jssive and thus more likely to make a 
mistake. For the human assembly, we contin- 
ued to use the fust "Rocks" substage where 
all unitigs with a good, but not definitive, 
discriminator score are placed in a scaffold 
gap. This was done with the condition that 
two or more mate pairs with one of their 
reads already in the scaffold unambiguously 
place the unitig in the given gap. We estimate 
the. probability of inserting a unitig into an 
incorrect gap with this strategy to be less than 
10"^ based on a probabilistic analysis. 
' We revised the ensuing "Stones" substage 
of the human assembly, making it more like 
the mechanism suggested in our earlier work 
{43). For each gap, every read R that is placed 
in the gap by virtue of its mated pair M being 
in a contig of the scaffold and implying R's 
placement is collected. Celera's mate-pairing 
information is correct more than 99% of the 
time. Thus, ahnost every, but not all, of the 
reads in the set belong in the gap, and when 
a read does not belong it rarely agrees with 
the remainder of the reads. Therefore, we 
simply assemble this set of reads within the 
gap, eliminating any reads that conflict with 
the assembly. This operation proved much 
more reliable than the one it replaced for the 
Drosophila assembly; in the assembly of a 
simulated shotgun data set of human chromo- 



Public Bactlqs 
(from 33.421 BACs) 



Bactigs & Celera pairs 
(binned by BAC) 




WGA Assembly CSA Assembly 

Fig. 4. Architecture of Celera's two-pronged assembly strategy. Each ovat denotes a computation 
process performing the function indicated by its label with the labels on arcs between ovals 
desaibing the nature of the objects produced and/or consumed by a process. This figure 
summarizes the discussion in the text that defines the temns and phrases used. 
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some 22, ail stones were placed correctly. 

The final method of resolving gaps is to 
fill them with assembled BAC data that cover 
the gap. We call this external gap "walking." 
We did not include the very aggressive "Peb- 
bles" substage described in our Drosophila 
work, which made enough mistakes so as to 
produce repeat reconstructions for long inter- 
spersed elements whose quality was only 
99.62% correct. We decided that for the hu- 
man genome it was philosophically better not 
to introduce a step that was certain to produce 
less than 99.99% accuracy. The cost was a 
somewhat larger number of gaps of some- 
what larger size. 

At the final stage of the assembly process, 
and also at several intermediate points, a 
consensus sequence of every contig is pro- 
duced. Our algorithm is driven by the princi- 
ple of maximum parsimony, with quality- 
value-weighted measures for evaluating each 
base. The net effect is a Bayesian estimate of 
the correct base to report at each position. 
Consensus generation uses Celera data when- 
ever it is present. In the event that no Celera 
data cover a given region, the BAC data 
sequence is used. 

A key element of achieving a WGA of the 
human genome was to parallelize the Oveiiap- 
per and the central consensus sequence-con- 
structing subroutines. In addition, memory was 
a . real issue — a straightforward q)plication of 
the software we had built for Drosophila would 
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have required a computer with a 600-gigabyte 
RAM. By making the Overiapper and Unitigger 
incremental, we were able to achieve the same 
computation with a maximum of instantaneous 
usage of 28 gigabytes of RAM. Moreover, the 
incremental nature of the first three stages al- 
lowed us to continually update the state of this 
part of the computation as data were delivered 
and then perform a 7-day run to complete Scaf- 
folding and Repeat Resolution whenever de- 
sired. For our assembly operations, the total 
con:q)ute infrastructure consists of 10 four-pro- 
cessor SMPs with 4 gigabytes of memory per 
cluster (Compaq's ES40, Regatta) and a 16- 
processor NUMA machine with 64 gigabytes 
of memory (Compaq's GSI60, Wildfire). The 
total compute for a run of the assembler was 
roughly 20,000 CPU hours. 

The assembly of Celera's data, together 
with the shredded bactig data, produced a set of 
scaffolds totahng 2.848 Gbp in span and con- 
sisting of 2,586 Gbp of sequence. The chaff, or 
set of reads not incorporated in the assembly, 
liumbered 11.27 million (26%), which is con- 
sistent with our experience fox Drosophila. 
. More than 84% of the genome was covered by 
scaffolds >100 kbp long, and these averaged 
91% sequence and 9% gaps with a total of 
2.297 Gbp of sequence. There were a total of 
93,857 gaps among the 1637 scaffolds >100 . 
kbp. The average scaffold size was 1.5 Mbp, 
the average contig size vras 24.06 kbp, arid the 
average gap size was 2.43 kbp, where the dis- 



tribution of each was essentially exponential. 
More than 50% of all j^s were less than 50C 
bp long, >62% of all gaps were less than 1 kbp 
long, and no gap was > 100 kbp long. Similar- 
ly, more than 65% of the sequence is in contigs 
>30 kbp, more than 31% is in contigs >100 
kbp, and the largest contig was 1.22 Mbp long. 
Table 3 gives detailed. summaiy statistics for 
the structure of this assembly with a direa 
comparison to the conpartmentalized shotgun 
assembly. 

2.4 Compartmentalized shotgun 
assembly 

In addition to the WGA approach, we pur- 
sued a localized assembly approach that was 
intended to subdivide the genome into seg- 
ments, each of which could be shotgun as- 
sembled individually. We expected that this 
would help in resolution of large interchrt>- 
mosomal duplications and improve the statis- 
tics for calculating U-unitigs. The compart- 
mentalized assembly process involved clus- 
tering Celera reads and bactigs into large, 
multiple megabase regions of the genome, 
and then running the WGA assembler on the 
Celera data and shredded, faux reads ob- 
tained fi-om the bactig data. 

The first phase of the CSA strategy was to 
separate Celera reads into those that matched 
the BAC contigs for a particular PFP BAC 
entry, and those that did not match any public 
data. Such matches must be guaranteed to 



Table 3. Scaffold statistics for whole-genome and compartmentalized shotgun assemblies. 
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>30 kbp 



>100 kbp 



>500 kbp 


>1000 kbp 


2.489,357.260 


2.248.689.128 


2.320.648.201 


2.106.521.902 


1.060 


721 


93.138 


82.009 


92.078 


81.288 


59.915 


53.354 


2,348.450 


3,118.848 


24.916. 


25.686 


1.832 


1.749 


1.988.321 


1.988.321 


87 


79 



No. of bp rn scaffolds 
(including intrascaffold gaps) 
■ No. of bp in contigs 
No. of scaffolds 
No. of contigs 
No. of gaps 
No. of gaps si kbp 
Average scaffold size (bp) 
Average contig size (bp) 
Average intrascaffold eap size 
(bp) 

Largest contig (bp) 
% of total contigs 

No. of bp in scaffolds 

(including intrascaffold gaps) 
No. of bp in contigs 
No. of scaffolds 
No. of contigs 
No. of gaps 
No. of gaps :S1 kbp 
Average scaffold size (bp) 
Average contig size (bp) 
Average Intrascaffold gap size 

(bp) 

Largest contig (bp) 
% of total contigs 



2.905.568.203 

2.653.979.733 
53.591 
170.033 
116.442 
72.091 
54.217 
15.609 
2.161 

1.988.321 
100 



Compartmentalized shotgun assembly 

2.748.892,430 2.700.489.906 



2.524.251.302 
2.845 
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109.362 
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966.219 
22.496 
2,054 
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95 

Whole-genome assembly 



2.491.538.372 
1.935 
107.199 
105.264 
67,289 
1395.602 
23.242 
1.985 

1.988.321 
94 
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221.036 
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2.334.343,339 
2.507 
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96.682 
60.343 
1,027,041 
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2,487 


2.297.678.935 
1.637 
95.494 
93.857 
59.156 
1.542.660 
24.061 
2.426 


. 2,143.002.184 
818 
84,641 
83.823 
54.079 
2,846.620 
25319 
2,213 


1,983.305.432 
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76.285 
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49.592 
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1.224.073 
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90 
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1,224.073 
83 


1.224.073 
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properly place a Celera read, so all reads were 
first masked against a library of common 
repetitive elements, and only matches of at 
least 40 bp to unmasked portions of the read 
constinited a hit Of Celera*s 27.27 million 
reads, 20.76 million matched a bactig and 
another 0.62 million reads, which did not 
have any matches, were nonetheless identi- 
fied as belonging in the region of the bactig *s 
BAG because their mate matched the bactig. 
Of the remaining reads, 2.92 million were 
completely screened out and so could not be 
matched, but the othei: 2.97 million reads had 
umnasked sequence totaling 1.189 Gbp that 
were not found in the GenBank data set. 
Because the Celera data are 5. 1 1 x redundant, 
we estimate that 240 Mbp of unique Celera 
sequence is not in the GenBank data set. 

In the next step of the CSA process, a 
combining assembler took the relevant 5X 
Celera reads and bactigs for a BAC entry, and 
produced an assembly of the combined data 
for that locale. These high-quality sequence 
reconstructions were a transient result whose 
utility was simply to provide more reliable 
information for the purposes of their tiling 
into sets of overlapping and adjacent scaffold 
sequences in the next step. In outline, the 
combining assembler first examines the set of 
matching Celera reads to determine if there 
are excessive pileups indicative of un- 
screened repetitive elements. Wherever these 
occur, reads in the repeat region whose mates 
have not been mapped to consistent positions 
are removed. Then all sets of mate pairs that 
consistently imply the same relative position 
of two bactigs are bundled into a link and 
weighted according to the number of mates in 
the bundle. A "greedy" strategy then attempts 
to order the bactigs by selecting bundles of 
mate-pairs in order of their weight. A selected 
mate-pair bundle can tie together two forma- 
tive scaffolds. It is incorporated to form a 
single scaffold only if it is consistent with the 
majority of links between contigs of the scaf- 
fold. Once scaffolding is complete, gaps are 
filled by the "Stones" strategy described 
above for the WGA assembler. 

The GenBank data for the Phase 1 and 2 
BACs consisted of an average of 19.8 bactigs 
per BAC of average size 8099 bp. Applica- 
tion of the combining assembler resulted in 
individual Celera BAC assemblies being put 
together into an average of 1.83 scaffolds 
(median of I scaffold) consisting of an aver- 
age of 8.57 contigs of average size 18,973 bp. 
In addition to defining order and orientation 
of the sequence fragments, there were 57% 
fewer gaps in the combined result For Phase 
0 data, the average GenBank entry consisted 
of 91.52 reads of average lengi 784 bp. 
Application of the combining assembler re- 
sulted in an average of 54.8 scaffolds consist- 
ing of an average of 58.1 contigs of average 
size 873 bp. Basically, some small amount of 
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^^bmbly took place, but not enough Celera 
^Ra were matched to truly assemble the 0.5 X 
to IX data set represented by the typical 
Phase 0 BACs. The combining assembler 
was also applied to the Phase 3 BACs for 
SNP identification, confirmation of assem- 
bly, and localization of the Celera reads. The 
phase 0 data suggest that a combined whole- 
genome shotgun data set and 1 X light-shot- 
gim of BACs will not yield good assembly of 
BAC regions; at least 3X light-shotgun of 
each BAC is needed. 

The. 5.89 million Celera fragments not 
matching the GenBank data were assembled 
with our whole-genome assembler. The as- 
sembly resulted in a set of scaffolds totaling 
442 Mbp in span and consisting of 326 Mbp 
of sequence. More than 20% of the scaffolds 
were >5 kbp long, and these averaged 63% 
sequence and 27% gaps with a total of 302 
Mbp of sequence. All scaffolds >5 kbp were 
forwarded along with all scaffolds produced 
by the combining assembler to the subse- 
quent tiling phase. 

At this stage, we typically had one or two 
scaffolds for every BAC region constituting 
. at least 95% of the relevant sequence, and a 
collection of disjoint Celera-unique scaffolds. 
The next step in developing the genome com- 
ponents was to determine the order and over- 
lap tiling of these BAC and Celera-unique 
scaffolds across the genome. For this, we 
used Celera's 50-kbp mate-pairs information, ^ 
and B AC-end pairs (18) and sequence tagged 
site (STS) markers (^^) to provide long- 
range guidance and chromosome separation. 
Given the relatively manageable niunber of 
scaffolds, we chose not to produce this tiling 
in a fully automated manner, but to compute 
an initial tiling with a good heuristic and then 
use human curators to resolve discrepancies 
or missed join opportunities. To this end, we 
developed a graphical user interface that dis- 
played the graph of tiling overlaps and the 
evidence for each. A human curator could 
then explore the implication of mapped STS 
data, dot-plots of sequence overlap, and a 
visual display of the mate-pair evidence sup- 
porting a given choice. The result of this 
process was a collection of "components," 
where each component was a tiled set of 
BAC and Celera-unique scaffolds that had 
been curator-approved. The process resulted 
in 3845 components with an estimated span 
of 2.922 Gbp. 

In order to generate the final CSA, we 
assembled each component with the WGA 
algorithm. As was done in the WGA process, 
the bactig data were shredded into a synthetic 
2X shotgun data set in order to give the 
assembler the freedom to independently as- 
semble the data. By using faux reads rather 
than bactigs, the assembly algorithm could 
correct errors in the assembly of bactigs and 
remove chimeric content in a PFP data entry. 



i;hi^^ or contaminating sequence (from 
mofli^a 



Ch 

anoH^art of the genome) would not be 
incorporated into the reassembly of the com- 
ponent because it did not belong there. In 
effect, the previous steps in the CSA process 
served only to bring together Celera frag- 
ments and PFP data relevant to a large con- 
tiguous segment of the genome, wherein we 
applied the assembler used for WGA to pro- . 
duce an ab initio assembly of the region. 

WGA assembly of the components result- 
ed in a set of scaffolds totaling 2:906 Gbp in 
span and consisting of 2.654 Gbp of se- 
quence. The chaff, or set of reads not incor- 
porated into the assembly, numbered 6.17 
million, or 22%. More than 90.0% of the 

. genome was covered by scaffolds spanning 
>100 kbp long, and these averaged 92.2% 
sequence and 7.8% gaps with a total of 2.492 
Gbp of sequence. Tliere were a total of 
105.264 gaps among the 107,199 contigs that 
belong to the 1940 scaffolds spanning >100 
kbp. The average scaffold size was 1.4 Mbp, 
the average contig size was 23.24 kbp, and 
the average gap size was 2.0 kbp where each 
distribution of sizes was exponential. As 
such, averages tend to be underrepresentative 
of the majority of the data. Figure 5 shows a 
histogram of the bases in scaffolds of various 

, size ranges. . Consider also that more than 
49% of all gaps were <500 bp long, more 
than 62% of all gaps were <1 kbp, and all 
gaps are < 100 kbp long. Similarly, more than 
73% of the sequence is in contigs > 30 kbp, 
more than 49% is iii contigs >100 kbp. ,^d 
the largest contig was 1.99 Mbp long. Table 3 
provides summary statistics for the structure 
of this assembly with a direct comparison to 
the WGA assembly. 

2.5 Comparison of the WGA and CSA 
scaffolds 

Having obtained two assemblies of the hu- 
man genome via independent computational 
processes (WGA and. CSA), we, compared 
scaffolds from the two assemblies as another 
means of investigating their completeness, 
consistency, and contiguity. From each as- 
sembly, a set of reference scaffolds contain- 
ing at least 1000 fragments (Celera sequenc- 
ing reads or bactig shreds) was obtained; this 
amounted to 2218 WGA scaffolds and 1717 
CSA scaffolds, , for a total of 2.087 Gbp and 
2.474 Gbp. The sequence of each reference 
scaffold was compared to the sequence of all 
scaffolds from the other assembly with which 
it shared at least 20 fragments or at least 20% 
of the fragments of the smaller scaffold. For 
each such comparison, all matches of at least 
200 bp with at most 2% mismatch were 
tabulated. 

From this tabulation, we estimated the 
amount of imique sequence in each assembly 
in two ways. The first was to determine the 
number of bases of each assembly that were 
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not covered by a matching segment in the 
other assembly. Some 82.5 Mbp of the WGA 
(3.95%) was not covered by the CSA, where- 
as 204.5 Mbp (8.26%) of the CSA was not 
covered by the WGA. This estimate did not 
require any consistency of the assemblies or 
any imiqueness of the matching segments. 
Thus, another analysis was conducted in 
. which matches of less than 1 kbp between a 
pair of scaffolds were excluded unless they 
were confirmed by other matches having a 
consistent order and orientation. This gives 
some measure of consistent coverage: 1.982 
Gbp (95.00%) of the WGA is covered by the 
CSA, and 2.169 Gbp (87.69%) of the CSA is 
covered by the WGA by this more stringent 
measure. 

The comparison of WGA to CSA also . 
permitted evaluation of scaffolds for structur- 
al inconsistencies. We looked for instances in 
which a large section of a scaffold &om one 
assembly matched only one scaffold from the 
other assembly, but failed to match over the 
full length of the overlap implied by the 
matching segments. An initial set of candi- 
dates was identified automatically, and then 
each candidate was inspected by hand. From 
this process, we identified 31 instances in 
which the asseniblies appear to disagree in a 
nonlocal fashion. These cases are being fur- 
ther evaluated to deteraiine which assembly 
is in error and why. 

In addition, we evaluated local inconsis- 
tencies of order or orientation. The following 
results exclude cases in which one cpntig in 
one assembly corresponds to more than one 
overlapping contig in the other assembly (as 
long as the order and orientation of the latter 
agrees with the positions they match in the 
former). Most of these small rearrangements 
involved segments on the order of hundreds 
of base pairs and rarely >1 kbp. We found a 
total of 295 kbp (0.012%) in the CSA assem- 
blies that were locally inconsistent with the 
WGA assemblies, whereas 2.108 Mbp 
(0.11%) in the WGA assembly were incon- 
sistent with the CSA assembly. 
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The CSA assembly was a few percentage 
points better in terms of coverage and slightly 
more consistent than the WGA, because it 
was in effect performing a few thousand shot- 
gun assemblies of megabase-sized problems, 
whereas the WGA is performing a shotgun 
assembly of a gigabase-sized problem. When 
one considers the increase of two-and-a-half 
orders of magnitude in problem size, the in- 
- formation loss between the two is remarkably 
small. Because CSA was logistically easier to 
deliver and the better of the two results avail- 
able at the time when downstream analyses 
needed to be begun, ail subsequent analysis 
was performed on this assembly. * 

2.6 Mapping scaffolds to the genome 

The final step in assembling the genome was to 
• order and orient the scaffolds on the chromo- 
somes. We first grouped scaffolds together on 
the basis of their order in the components from 
CSA. These grouped scaffolds were reordered 
by examining residual mate-pairing data be- 
tween the scaffolds. We next mapped the scaf- 
fold groups onto the chromosome using physi- 
cal mapping data. This step depends on having 
reliable high-resolution map information such 
that each scaffold will overlap multiple mark- 
ers. There are two genome-wide types of map 
infomiation available: high-density STS maps 
and fingerprint maps of B AC clones developed . 
at Washington University (4S), Among the ge- 
nome-wide STS maps, GeneMap99 (GM99) 
has the most markers and therefore was most 
useful for mapping scaffolds. The two different 
mapping approaches are complementary to one 
another. The fingetprint maps should have bet- 
ter local order because they were built by com- 
parison of overlapping BAC clones. On the 
other hand, GM99 should have a more reliable 
long-range order, because the framework maim- 
ers were derived fit>m well-vahdated genetic 
maps. Both types of m^s were used as a 
reference for human curation of the compo- 
nents that were the ii^ut to the regional assem- 
bly, but they did not determine the order of 
sequences produced by the assembler. 
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Scaffold Size 

Fig. 5. Distribution of scaffold sizes of the CSA For each range of scaffold sizes, the percent of total 
sequence is indicated. 



In order to detemiine the effectiveness of 
the fingerprint maps and GM99 for mapping 
scaffolds, we first examined the reliability of 
these maps by comparison with large scaf- 
folds. Only 1% of the STS markers„on the 10 
largest scaffolds (those >9 Mbp) were 
mapped on a different chromosome on 
GM99. Two percent of the STS markers dis- 
agreed in position by more than five frame- 
work bins. However, for 'the fingerprint 
maps, a 2% chromosome discrepancy was 
observed, and on average 23.8% of BAC 
locations in the scaffold' sequence disagreed 
with fingerprint map placement by more than 
five BACs. When further examining the 
source of discrepancy, it was found that most 
of the discrepancy came from 4 of the 10 
scaffolds, indicating this there is variation in 
the quality of either the map or the scaffolds. 
All four scaffolds were assembled, as well as 
the other six, as judged by clone coverage 
analysis, and showed the same low discrep- 
ancy rate to .GM99, and thus we. concluded 
that the fingerprint map global order in these 
cases was not reliable. Smaller scaffolds had 
a higher discordance rate with GM99 (4.21% 
of STSs were discordant by more than five 
framework bins), but a lower discordance rate 
with the fingerprint maps (11% of BACs 
disagreed with fingerprint maps by more than 
five BACs). This observation agrees with the 
clone coverage analysis {46} that Celera scaf- 
fold construction was better supported by 
- long-range mate pairs in larger scaffolds than 
in small scaffolds. 

We created two orderings of Celera scaf- 
folds on the basis of the markers (BAC or 
STS) on these maps. Where the order of 
scaffolds agreed between GM99 and the 
WashU BAC map, we had a high degree of 
confidence that that order was correct; these 
scaffolds were termed "anchor scaffolds." 
Only scaffolds with a low overall discrepancy 
rate with both maps were considered anchor 
scaffolds. Scaffolds in GM99 bins were al- 
lowed to permute in their order to match 
WashU ordering, provided they did not vio- 
late their framework orders. Orientation of 
individual scaffolds was determined by the 
presence of multiple mapped markers with . 
consistent order. Scaffoldls with only one 
marker have insufficient information to as- 
sign orientation. We found 70.1% of the ge- 
nome in anchored scaffolds, more than 99% : 
of which are also oriented (Table 4). Because '■ 
GM99 is of lower resolution than the WashU 
map, a nimiber of scaffolds without STS 
matches could be ordered relative to the an- 
chored scaffolds because they included se- 
quence from the same or adjacent BACs on 
tfie WashU map. On the other hand, because 
of occasional WashU global ordering dis- 
crepancies, a number of scaffolds determined 
to be **urunappable" on the WashU map could 
be ordered relative to the anchored scaffolds 
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with GM99. These scaffolds were termed 
' "ordered scaffolds." We found that 13.9% of 
the assembly could be ordered by these ad- 
ditional methods, and thus 84.0% of the ge- 
nome was ordered unambiguously. 

Next, all scaffolds that could be placed, 
but not ordered, between anchors were as- 
signed to the interval between the anchored 
scaffolds and were deemed to be "bound- 
ed" between them. For example, small scaf- 
folds having STS hits from the same Gene- 
Map bin or hitting the saine BAG cannot be 
ordered relative to each other, but can be 
assigned a placement boundary relative to 
other anchored or ordered scaffolds. The 
remaining scaffolds either had no localiza- 
tion information, conflicting information, 
or could only be assigned to a generic 
chromosome location. Using the above ap- 
proaches, -^98% of the genome was an- 
chored, ordered, or bounded. 

Finally, we assigned a location for each 
scaffold placed on the chromosome by 
spreading out the scaffolds per chromosome. 
We assumed that the remaining unmapped 
scaffolds, constituting 2%. of the genome, 
were distributed evenly across the genome. 
By dividing the sum of unmapped scaffold 
lengths with the simi of the number of 
mapped scaffolds, we arrived at an estimate 
of interscaffold gap of 1483 bp. This gap was 
used to separate all the scaffolds on each 
chromosome and to assign an offset m the 
chromosome. 

During the. scaffold-mapping effort, we en- 
countered many problems that resulted in addi- 
tional quality assessment and validation analy- 
sis. At least 978 (3% of 33,173) BACs were 
believed to have sequence data from more than 
one location in the genome (47). This is con- 
sistent with the bactig chimerism analysis re- 
ported above in the Assembly Strategies sec- 
tion. These BACs could not be assigned to 
unique positions within the CSA assembly and . 
thus could not be used for ordering scaffolds. 
Likewise, it was not always possible to assign 
STSs to unique locations in the assembly be- 
cause of genome diq)lications, repetitive ele- 
ments, and pseudogenes. 

Because of the time required for an ex- 
haustive search for a perfect overlap, CSA 
generated 21,607 intrascaffold gaps where 
the mate-pair data suggested that the contigs 
should overlap, but no overlap was found. 
These gaps were defined as a fixed 50 bp in 
length and make up 18.6% of the total 
116,442 gaps in the CSA assembly. 

We chose not to use the order of exons 
implied m cDNA or EST data as a way of 
ordering scaffolds. The rationale for not us- 
ing this data was that doing so would have 
biased certain regions of the assembly by 
rearranging scaffolds to fit the transcript data 
and made validation of both the assembly and 
gene definition processes more difficult 
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Assembly and validation analysis 

We analyzed the assembly of the genome 
' fi-oin the perspectives of completeness 
(amount of coverage of the genome) and 
correctness (the structural accuracy of the 
order and orientation and the consensus se- 
quence of the assembly). 

Completeness. Completeness is defined as 
the percentage of the euchromatic sequence 
represented in the assembly. This cannot be 
kno^yn with absolute certainty until the eu- 
. chromatin . sequence- has been completed. 
However, it is possible to estimate complete- 
ness on the basis of (i) the estimated sizes of 
intrascaffold gaps; (ii) coverage of the two 
published chromosomes, 21 and 22 (48, 49); 
and (iii) analysis of the percentage of an 
independent set of random sequences (STS 
markers) contained in the assembly. The . 
whole-genome libraries contain heterochro- 
matic sequence and, although no attempt has 
been made to assemble it, there may be in- 
stances of unique sequence embedded in re- 
gions of heterochromatin as were observed in 
Drosophila (50, 5 J). 

The sequences of human chromosomes 2 1 
and 22 have been completed to high qiiality 
and published (48, 49). Although this se- 
quence served as input to the assembler, the 
finished sequence was shredded into a shot- 
gun data set so that the assembler had the , 
opportunity to assemble it differently from 
the original sequence in the case of structural 
. polymorphisms or assembly errors in the 
BAC data. In particular, the assembler must 
be able to resolve repetitive elements at the 
. scale of components (generally multimega- 
base in size), and so this comparison reveals 
the level to which the assembler resolves 
repeats. In certain areas, the assembly struc- 
ture differs from the published versions of 
chromosomes 21 and 22 (see below). The 
consequence of the flexibility to assemble 
"finished" sequence differently on the basis 
of Celera data resulted in an assembly with 
more segments than the chromosome 21 and 
22 sequences. We examined the reasons why 
there are more gaps in the Celera sequence 
than in chromosomes 21 and 22 and expect 
that they may be typical of gaps in other 
regions of the genome. In the Celera assem- 
bly, there are 25 scaffolds, each containing at 
least 10 kb of sequence, that collectively span 
94.3% of chromosome 21. Sixty-two scaf- 
folds span 95.7% of chromosome 22. The 
total length of the gaps remaining in the 
Celera assembly for these two chromosomes 
is 3.4 Mbp. These gap sequences were ana- 
lyzed by RepeatMasker and by searching 
against the entire genome assembly (52). 
About 50% of the gap sequence consisted of 
common repetitive elements identified by Re- 
peatMasker, more than half of the remainder 
was lower copy number repeat elements. 
A more global way of assessing complete- 



ness i^^easure the content of an mdependent 
set of sequence data in the assembly. We com- 
pared 48,938 STS markers from Genemap99 
(5i) to the scaffolds. Because these markers 
were not used in the assembly processes, they 
provided a truly independent measure of com- 
pleteness. ePCR (53) and BLAST (54) were 
used to locate STSs on the assembled genome. 
We found 44,524 (91%) of the STSs in the 
mapped genome. An additional 2648 markers 
(5.4%) were found by searching the, unas- 
sembled data or "chaff." We identified 1283 
STS markers (2.6%) not found in either Celera 
sequence or BAC data as of September 2000, 
raising the possibility that these markers may 
not be of human origin. If that were the case, 
the Celera assembled sequence would represent 
93.4% of the human genome and the unas- 
sembled data 5.5%, for a total of 98.9% cover- 
age. Similarly, we compared- CSA against 
36,678 TNG radiation hybrid markers (55a) 
using the same method. We found that 32,371 
markers (88%) were located in the mapped 
CSA scaffolds, with 2055 markers (5.6%) 
found in the remainder. This gave a 94% cov- 
erage of the genome through another genome- 
wide survey. 

Correctness. Correctness is defined as the 
structural and sequence accuracy of the as- 
sembly. Because the source sequences for the 
Celera data and the GenBank data are firom 
different individuals, we could not directly 
compare the consensus sequence of the as-. 

Table 4. Summary of scaffold mapping. Scaffolds 
•were mapped to the genome with different levels 
of confidence (anchored scaffolds have the highest 
confidence; unmapped scaffolds have the lowest). 
Anchored scaffolds were consistently ordered by 
the WashU BAC map and CM99. Ordered scaf- 
folds were consistently ordered by at least one of 
the following: the WashU BAC map. CM99. or 
component tiling path. Bounded scaffolds had or- 
der conflicts between at least two of the external 
maps, but their placements were adjacent to a 
neighboring anchored or ordered scaffold. Un- 
mapped scaffolds had, at most a chromosome 
assignment The scaffold subcategories are given 
below each category. 
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2.001 
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sembly against other finished sequence for 
detennining sequencing accuracy at the nu- 
cleotide level, although this has been done for 
identifying polymorphisms as described in 
Section 6. The accuracy of the consensus 
sequence is at least 99.96% on the basis of a 
statistical estimate derived from the quality 
values of the imderlying reads. 

The structural consistency of the assembly . 
can be measured by mate-pair analysis. In a . 
correct assembly, every mated pair of se- . 
quencing reads should be located on the con- 
sensus sequence with the correct separation 
and orientation between the pairs. A pair is 
terrned *Valid" when the reads are in the . 
correct orientation, and the distance between 
them is within the mean ± 3 standard devi- 
ations of the distribution of insert sizes of the 
library from which the pair was sampled. A 
pair is termed "misoriented" when the reads 
are not correctly oriented, and is termed "mis- 
separated" when the distance between the 
reads is not in the correct range but the reads 
are correctly oriented The mean ± the stan- 
dard deviation of each library used by the 
assembler was determined as described 
above. To validate these, we examined all 
reads mapped to the finished sequence of 
chromosome 21 (48) and determined how 
many incorrect mate pairs there were as a 
result of laboratory tracking errors and chi- . 
merism (two different segments of. the ge- 
nbme cloned into the same plasmid), and how 
tight the distribution of insert sizes was for 
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those that were correct (Table 5). The stan- 
dard deviations for ail Celera libraries were 
quite small, less than 15% of the insert 
length, with the exception of a few 50-kbp 
libraries. The 2- and 10-kbp libraries con- 
tained less than 2% invalid mate pairs, where- 
as the 50-kbp libraries were somewhat higher 
(-^10%). Thus, although the mate-pair infor- 
mation was not perfect, its accuracy was such 
that measuring valid, misoriented, and mis-. - 
separated pairs with respect to a given assem- 
bly was deemed to be a reliable instrument 
for validation purposes, especially when sev- 
. eral mate pairs confirm or deny an ordering. 
The clone coverage of the genome was 
39 X, meaniiig that any given base pair was, . 
on average, contained in 39 clones or, equiv- 
alently, spanned by 39 mate-paired reads. 
Areas of low clone coverage or areas with a 
high proportion of invalid mate pairs would 
indicate potential assembly problems. We 
computed the coverage of each base in the 
assembly by valid mate pairs (Table 6). In . 
summary, for scaffolds >30 kbp in length, 
less than 1% of the Celera assembly was in 
regions of less than 3 X clone coverage. Thus, 
more than 99% . of the assembly, including 
order and orientation, is strongly supported 
by this measure alone. 

We examined the locations and number of 
all misoriented and misseparated rnates. In 
addition to doing this analysis on the CSA 
assembly (as, of 1 October 2000), we also > 
peiformed a study of the PFP assembly as of 



•5 September 2000 {30, 55b). In this latter 
case, Celera mate pairs had to be mapped to 
.the PFP assembly. To avoid mapping errors 
- due to high-fidelity repeats, the only pairs 
mapped were those for which both reads 
matched at only one location with less than 
6% differences. A threshold was set such that 
sets of five or more simultaneously invalid 
mate pairs indicated a potential breakpoint, 
where the construction of the two assemblies 
differed. The graphic comparison of the CSA 
chromosome 21 assembly with the published 
sequence (Fig. 6A) serves as a validation of 
this methodology. Blue tick marks in the 
panels indicate breakpoints. There were a 
similar (small) number of breakpoints on 
both chromosome sequences. The exception 
was 12 sets of scaffolds in the Celera assem- 
bly (a total of 3% of the chromosome length 
in 212 single-contig scaffolds) that were 
mapped to the wrong positions because they 
were too small to be mapped reliably. Figures 
6 and 7 and Table 6 illustrate the mate-pair 
differences and breakpoints between the two 
assemblies. There was a higher percentage of 
misoriented and misseparated mate pairs in 
the large-insert libraries (50 kbp and BAG 
ends) than in the small-insert libraries in both 
assemblies (Table 6). The large-insert librar- 
ies are more likely to identify discrepancies 
simply because they span a larger segment of 
the genome. The graphic comparison be- 
tween the two assemblies for chromosome 8 
(Fig. 6, B and C) shows that there are many 



Table 5. Mate-pair validation. Celera fragment sequences were mapped to 
the published sequence of chromosome 21. Each mate pair uniquely 
mapped was evaluated for correct orientation and placement (number 



of mate pairs tested). If the two mates had incorrect relative orienta- 
tion or placement, they were considered invalid (number of invalid mate 
pairs). 



Chromosome 21 



Genome 



Library 
type 


Library 
no. 


Mean 
Insert 
size 
(bp) 


SD 
(bp) 


SD/ 
mean 
(%) 


. No. of 
mate 
pairs 
tested 


No. of 
Invalid 
mate 
pairs 


% 
invalid 


Mean 
insert 
size (bp) 


SD 
(bp) 


SD/ 
mean 
1%) 


2 kbp 


1 


2.081 


106 


5.1 


3.642 


38 


1.0 


2.082 


90 


4.3 




2 


1.913 


152 


7:9 


28.029 


413 


1.5 


1.923 


118 


6.1 




3 


2.166 


175 


8.1 


4,405 


57 


.13 


2.162 


158 


7.3 


10 kbp 


4 


11.385 


851 


7.5 


4,319 


80 


1.9 


11.370 . 


696 


6.1 




5 


14.523 


1,875 


12.9 


7355 


156 


2.1 


14,142 


1.402 


9.9 




6 


9.635 


1.035 


10.7 


5.573 


109 


2.0 


9,606 


934 


9.7 




7 


10.223 


928 


9.1 


34,079 


399 


12 


10,190 


777 


7.6 


50 kbp 


8 


64.888 


2.747 


4.2 


16 


1 


63 


65,500 


5.504 


8.4 




9 


53.410 


5,834 


10.9 


914 


170 


18.6 


53,311 


5,546 


10.4 




, 10 


52,034 


7.312 


14.1 


5,871 


569 


9.7. 


51,498 : 


6,588 


12.8 




11 


52,282 


7.454 ■ 


14.3 


2.629 


213 


8.1 - 


52,282. 


7.454 


143 




12 


46,616 


. 7.378 


15.8 


2,153 


215 


10.0 


45.418 


9.068 


20.0 




13 


55,788 


10.099 


18.1 


2,244 


249 


11.1 


53.062 


10.893 


20.5 




14 


39.894 


5.019 


12.6 


199 


7 


3.5 


36.838 


9.988 


27.1 


BES 


15 


48.931 


9.813 


20.1 


144 


10 


6.9 


47,845 


4.774 


10.0 




16 


48.130 


4.232 


8.8 


195 


14 


72 


47.924 


4.581 


9.6 




17 


106,027 


27.778 


26.2 


330 


16 


4JB 


152,000 


26.600 


17.5 




18 


160.575 


54.973 


34.2 


155 


8 


52 


161.750 


27.000 


16.7 




19 


164.155 


19.453 


11.9 


642 


44 




176.500 


19.500 


11.05 


Sum 










102,894 


2.768 


^7 




















(mean = 2.7) 











1316 



16 FEBRUARY 2001 VOL 291 SCIENCE www5ciencemag.org 



THE 



HUMAN GENOME 



; for the Celera ^^^'^i for both 

3Sseinbhesofeacnc ^^^^^^^n of 

side fashion. The «der ^y^^^my fewer 
Celera's ''ssembly shows ^j^^, 
b^mts except on *eJ*o ^^^^ 

mosomes. Figure 7 a»so « ^ 

gapshave^ne^don^^^^^edby 
mate-pair data. BreaKfwui . ^le two 

structural PO»V;"<>2^^,VoS^^erent hu- 
SS fStb'senome assemblies. 



full-length cDNA has ^^^^^ is 

known. De "0^° 8^^^ ^ ^ to find genes 
less accurate 's^^^^^^^^^^ homologous pro- 
d,at are not represented jy ^^^^^^ 
teins or ESTs- The lo ^ ^ ^ ^ to 
scribes the medwds we ha ^^^^^^ 



I 

r 
* 

) 
5 
1 
0 
6 

5 - 
7 

05 



A oitPmative transcription ini- 
tivesplicmgand altem^^^^^ Our cells are 
tiation and tennmauon ^^^^^^ ,f ^ase 
able to discern signals for 

pairs of the S^^^T^^^ splicing to- 

initiating transcnption and lo^ ^^^^^ 

iosoines. Figure 7 aisou^^^^--.^-,-, structure of each gene and each tran ^^^^ZS':^^^^-^^!!,^ 

whereas later estima^es^om ^^ure, In *e proce the 

>100.00P (^-J)- 'J^^t'jn ctors. based on 
the corporate and P**"^ q |,iand, and 
extrapolations J/, extrapolations, have 
transcript density-ba^ed exttap ^^^^ 

„ot reduc^Sj,Xnes^mSes from a 

;:^rf.lln^ePha™agu^^^^ "^TF^r :^pr: 
b^d on a -t^f:^^'cpG islands (57). '^Jio jTa nUer of ESTs and 

association of ESTs w^ Cp^^ ^^^^^ ^TL^ThS or not they can be connec.- 



,cene Prediction and Annotation 
3 Gene rreo. inventory, 
Summaiy. To ^''^^J^ Sence-based 
we developed -^^f,^lnce ^ed to 
approach nam^^-^j identifying genes 
increase the 1'1'^;'*'°^ between the 
^^^"^^'^ jCan genrmes. similarity to 
mouse and^'^^l derived data, or simi- 
ESTs or other n*^^T;Loarison of Otto 
l-^^*°°?n„CSaa^rSCiology) 

(combined 0"o-R''*^\^X Je-predi^^ al- 
UGenscan,as^d^dg^epr 



..^eture. In the P-^^^^^^^esV 
of the Senome a hum^ cm^^^^^^ pjpe- 

evid^ice P^^;^j„^ *'d~x^es how var- 
line (descnbed below; ^ ^ 
ious types of e^.den« ^^^^J/^^dence in 
curator puts diffe«nU^e^ 

o e^^ce to support gene 
certam patterns of eviO ^ 



association of ESls w. f ^^^^^j^ and 

In stark contras^ ^*^„e 7-35,000 genes 
niuch lower estmiates^oneot . 

derived with Be.^7;':oniunction with 
^^''^^Jl'^frJm-, anoAer of 28,000 



ro&Otto-RefSeqandU^^^^^ V^oc^^^ -^^^r oi 2f^,000 

with Genscan, a standard ge^^.-P^^T chromosome 22 <f^.^l°>'.. ^ comparative 

;Jt^. showed g?f---r,^S63Tof to 34,000 genes^-^ ^J^^^ J 

Lo.50)andspe^^^^^^^^^ -^^^^^fn S^T^id the puffer fi^^^^ 



sus 0.50) and specmcuy v-- ■ 
Otto in tbe ability de^e g^^ 
Otto-predicted ge^^^^e gene-prediction 

-^^''^^"iSS^erbut still si 
programs that exmniw 
iificant, «=vidence that^y m y 
pressed. Conservative cntena, req ^ 

feast two lines of e. -^^^ 
define a set of 26,383 gene 
fidencethatwereus^fon^^^^^^^^^ ^^^^^^ 

ysis presented m *^™,3tablish pre- 
Extensive manuj cm^^ ^^^^jlb, 



annotation. For examp.c, » 
anune homology to a n^^^JJ^^^^r^ect- 
evaluate whether or not curator 
ed into a longer, >^^^J^^3i^. 
would also evaluate tt^e sttengm ^ 
larity and the conj^i^ of the ^^^^ 

essence asking ^he*f...'5^e edges of 
splice-junctions ^d -^^^^^ J sites. 

^ a'TmS^l STotation process was 

rdri:2rre.;o.^^^^^^ 

The Otto system can promoie 



„ 34,000 gems " ;e„e consova- 

Son between bmn»js ana m 

"'S^^l Sertved simply >>> 

^Ses" 1.-™° S "S. , *. basis .t • 



necessary to V'", -„„i, 
initial computanonal approach. 

31 Automated gene annotation 

A «ne U a iocus of cotranscribed exons^ 
A gene is a ^ multiple tran- 

SuSercSVby-ansofalterna- 



pingprotein^mdb. -^^^^^^ ll„e 
computational ^'^1^1^°^^ against pro- 
searches the f ^ff°^l^SeSe 2tabasesto 

Table 6. Cenome-w.de mate^^ ^ . To identify likely gen ^ ^ Otto 

gi^'^l^^^TsTcuenSSS^^^^ 
°" m IS Sf database sequences 

I; Ae region under analysis was 
matched m Ae regio 

compared by an aigo patching se- 

accountboth ccor^ates om^^^ type (e-g- 
quence, as well as me s 4 ^^^^ 

p"«^^Ti»fr..SSl»bi»«t«»«^ 



Genome 
library 



Tz1> 1! ! ^1:9 protein. EST. ^d so f^M-^^^^^^^ 

87-3 -T-TIS usedtogroupthemjchesmtoD ^j^„^i{y 

TSTir;;:;;:;^^ sequences that may define a gene 

mean Ubran, sae. ^ 



used to group - ^e i 
^ o„Bne.twww-« .- sequences that may define a gen 

v^ww.sdencemag.org SCIENCt 



1 



gene boundaries. During this process, multiple 
hits to the same region were collapsed to a 
coherent set of data by tracking the coverage of 
a region. For example, if a group of bases was 
represented by multiple overlapping ESTs, the 
union of these regions matched by the set of 
ESTs on the scaffold was marked as being 
siq)ported by EST evidence. This resulted in a 
series of "gene bins," each of which was be- 
lieved to contain a single gene. One weakness of 
tfiis initial implementation of the algorithm was 
in predicting gene boundaries in regions of tan- 
demly duplicated genes. Gene clusters frequent- 
ly resulted in homologous neighboring genes 
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being joined together, resulting in an annotation 
that artificially concatenated these gene models. 

Next, known genes (those with exact match- 
es of a full-length cDNA sequence to the ge- 
nome) were identified, and the region corre- 
sponding to the cDNA was annotated as a 
predicted transcript. A subset of the curat- 
ed human gene set RefSeq from the Nation- 
al . Center for Biotechnology Information 
(NCBI) was included as a data set searched in 
the computational pipeline. If a RefSeq tran- 
script matched the genome assembly for at least 
50% of its length at >92% identity, then the 
SIM4 (63) alignment of the RefSeq transcript to 



the region of the genome under analysis was 
promoted to the status of an Otto annotation. 
Because the genome sequence has gaps and 
sequence errors such as frameshifls, it was not 
always possible to predict a transcript that 
agrees precisely with the experimentally deter- 
mined cDNA sequence. A total of 6538 genes 
in our inventory were identified and transcripts 
predicted in this way. 

Regions that have a substantial amount of 
sequence similarity, but do not match known 
genes, were analyzed by that part of the Otto 
system that uses the sequence similarity in- 
fonmation to predict a transcript Here, Otto 
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Fig. 6. Comparison of the CSA and the PFP assembly. 
(A) All of chromosome 21, (B) ad of chromosome 8, 
and (C) a 1-Mb region of chromosome 8 representing 
a single Celera scaffold. To generate the figure, Celera 
fragment sequences were mapped onto each assem- 
bly. The PFP assembly is indicated in the upper third 
of each panel; the Celera assembly is indicated in the 
lower third. In the center of the panel green lines 
show Celera sequences that are in the same order and 
orientation in both assemblies and form the longest 
consistently ordered run of sequences. Yellow lines 
indicate sequence blocks that are in the same orien- 
tation» but out of order. Red lines indicate sequence 
blocks that are not in the same orientation. For 
clarity, in the latter two cases, lines are only drawn 
between segments of matching sequence that are at 
least 50 kbp long. The top and bottom thirds of each 
panel show the extent of Celera mate-pair violations . 
(red, misoriented; yellow, incorrect distance between 
the mates) for each assembly grouped by library size. 
(Mate pairs that are within the correct distance, as 
expected from the mean library insert size, are omit- 
ted from the figure for clarity.) Predicted breakpoints, 
corresponding to stacks of violated mate pairs of the 
same type, are shown as blue ticks on each assembly 
axis. Runs of more than 10,000 Ns are shown as cyan 
bars. Plots of all 24 chromosomes can be seen in Web 
fig. 3 on Science Online at www3ciencemag.org/cgi/ 
content/f ull/29 1 /5507/ 1 304/DC 1 . 
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evaluates evidence generated by the compu- 
tational pipeline, corresponding to conserva- . 
tion between mouse and human genomic 
DNA, similarity to human transcripts (ESTs 



The Human genome 

and cDNAs), similarity to rodent transcripts 
(ESTs and cDNAs), and similarity of the 
translation of human genomic DNA to known 
proteins to predict potential genes in the hu- 




man genome. The sequence from the region 
of genomic DNA contained in a gene bin was 
extracted, and the subsequences supported by 
any homology evidence were marfced (plus 100 
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Fig. 7, Schematic view of the distribution of breakpoints and large gaps 
on all chromosomes. For each chromosome, the upper pair of lines 
represent the PFP assembly, and the lower pair of lines represent Celera's 



assembly. Blue tick marks represent breakpoints, whereas red tick marks 
represent a gap of larger than 10.000 bp. The number of breakpoints per 
chromosome is indicated in black, and the chromosome numbers in red. 
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bases flanking these regions). The other bases 
in the region, those not covered by any homol- 
ogy evidence, were replaced by N's. This se- 
quence segment, with high confidence regions 
represented by the consensus genomic se- 
quence and the remainder represented by N*s, 
was then evaluated by Genscan to see if a 
consistent gene model could be generated This 
procedure simplified the gene-prediction task 
by first establishing the boundary for the gene 
(not a strength of most gene-finding algo- 
rithms), and by eliminating regions with no 
siQ)porting evidence. If Genscan retumed a 
plausible gene model, it was fiirther evaluated 
before being promoted to an "Otto" annotation. 
The final Genscan predictions were often quite 
diflferent fix)m the prediction that Genscan re- 
tumed on the same region of native genomic 
sequence. A weakness of using Genscan to 
refine the gene model is the loss of valid, small 
exons fitjm the final annotation. 

The next step in defining gene structures 
based on sequence similarity was to compare 
each predicted transcript with the homology- 
based evidence that was used in previous steps 
to evaluate the depth of evidence for each exon 
in the predictioa Intemal exons were consid- 
ered to be supported if they were covered by 
homology evidence to within ±10 bases of 
their edges. For first and last exons, the intemal 
edge was required to be within 10 bases, but the 
external edge ^yas allowed greater latitude to 
allow for 5' and 3' untranslated regions 
(UTRs). To be retained, a prediction for a 
multi-exon gene must have evidence such that 
the total number of '*hits," as defined above, 
divided by the number of exons in the predic- 
tion must be >0.66 or must correspond to a 
RefSeq sequence. A single-exon gene must be 
covered by at least three supporting hits (±10 
bases on each side), and these must cover the 
complete predicted open reading firame. For 
a single-exon gene, we also required that 
the Genscan prediction include both a start 
and a stop codon. Gene models that did not 
meet these criteria were disregarded, and 

Table 7. Sensitivity and specificity of Otto and 
Genscan. Sensitivity and specificity were calculat- 
ed by first aligning the prediction to the published 
RefSeq transcript tallying the number (N) of 
uniquely aligned RefSeq bases. Sensitivity is the 
ratio of N to the length of the published RefSeq 
transcript Specificity is the ratio of N to the 
length of the predlaion. All differences are signif- 
icant (Tukey.HSD; P < 0.001). 



Method 


Sensitivity 


Specificity 


Otto (RefSeq only)* 


0.939 


0.973 


Otto (homology)! 


0.e04 


0.884 


Genscan 


0.501 


0.633 



•Refers to those annotations produced by Otto using only 
the Sim4-poUshed RefSeq alignment rather than an evi- 
dence-based Genscan predictioa fRefers to those 
annotations produced by supplying all available eviderKe 
to Genscaa 



those that passed were promoted to Otto 
predictions. Homology-based Otto predic- 
tions do not contain 3' and 5' untranslated 
sequence. Although three de novo gene-finding 
programs [GRAIL, Genscan, and FgenesH 
(63)] were run as part of the computational 
analysis, the results of these programs were not 
directly used in making the Otto predictions. 
Otto predicted 11,226 additional genes by 
means of sequence similarity. 

3.2 Otto validation 

To validate the Otto homology-based process 
.and the method that Otto uses to define the 
structures of known genes, we compared tran- 
scripts predicted by Otto with their coirespond- 
ing (and presumably correct) transcript from a 
set of 4512 RefSeq transcripts for which there 
was a imique SIM4 alignment (Table 7). In 
order to evaluate the relative perfomiance of 
Otto and Genscan, we made three comparisons. 
The first involved a determination of tiie accu- 
racy of gene models predicted by Otto with 
only homology data other than the correspond- 
ing RefSeq sequence (Otto homology in Table 
7). We measured the sensitivity (correctly pre- 
. dieted bases divided by the total length of the 
cDNA) and specificity (correctly predicted 
bases divided by the sum of the correctly and 
incorrectly predicted bases). Second, we exam- 
ined the sensitivity and specificity of the Otto 
predictions that were made solely with the Ref- . 
Seq sequence, which is the. process that Otto . 
uses to annotate known genes (Otto-RefSeq). 
And third, we determined the accuracy of the 
Genscan predictions corresponding to these 
RefSeq sequences. As expected, the aUgnment 
method (0^tto-RefSeq) was the most accurate, 
and Otto-homology performed better than Gen- 
scan by both criteria. Thus, 6. 1% of true RefSeq 
nucleotides were not represented in the Otto- 
refseq annotations and 2.7% of the nucleotides 
in the Otto-RefSeq transcripts were not con- 
tained in the original RefSeq transcripts. The 
discrepancies could come from legitimate 
differences between the Celera assembly 
and the RefSeq transcript due to polymor- 
phisms, incomplete or incorrect data in the 
Celera assembly, errors introduced by Sim4 
during the aligrmient process, or the pres- 
ence of alternatively spliced forms in the 
data set used for the comparisons. 

Because Otto uses an evidence-based ap- 
proach to reconstruct genes, the absence of 
experimental evidence for iritervening exons 
niay irtadvertantly result iii a set of exons that 
cannot be spliced together to give rise to a 
transcript In such cases. Otto may "split genes" 
when in fact all the evidence should be com- 
bined into a single transcript We also examined 
the tendency of these methods to incorrectiy 
split gene predictions. These trends are shown 
in Fig. 8. Both RefSeq and homology-based 
predictions by Otto split known genes into few- 
er segments than Genscan alone. 



3.3 Gene number 

Recognizing that the Otto system is quite 
conservative, we used a different gene-pre- 
diction strategy in regions where the ho- 
mology evidence was less strong. Here the 
results of de novo gene predictions were 
used. For these genes, we insisted that a 
predicted transcript have at least two of the 
following types of evidence to be included 
in the gene set for fiirther analysis: protein, 
human EST, rodent EST, or mouse genome 
fragment matches. This final class of pre- 
dicted genes is a subset of the predictions 
made by the three gene-finding programs 
that were used in the computational pipe- 
line. For these, there. was not sufficient 
sequence similarity information for Otto to 
attempt to predict a gene structure. The 
three de novo gene-finding programs re- 
sulted in about 155,695 predictions, of 
which ^76,410 were nonredimdant (non- 
overlapping with one another). Of these, 
57,935 did not overlap known genes or 
predictions made by Otto. Only 21,350 of 
the gene predictions that did not overlap 
Otto predictions were partially supported 
by at least one type of sequence similarity 
evidence, and 8619 were partially support- 
ed by two types of evidence (Table 8). 

The sum of this number (21,350) and the 
number of Otto armotations (17,764), 39,1 14, 
is near the upper limit for the human gene 
complement. As seen in Table 8, if the re- 
quirement for other supporting evidence is 
made more stringent, this nimiber drops, rap- 
idly so that demanding two types of evidence 
reduces the total gene number to 26,383 and 
demanding three types reduces it to -^-23,000. 
Requiring that a prediction be supported by 
all four categories of evidence is too stringent 
because it would eliminate genes that encode 
novel proteins (members of currently unde- 
scribed protein families). No correction for 
pseudbgenes has been made at this point in 
the analysis. 

In a further attempt to identify genes that 
were not found by the autoaimotation process 
or any of the de novo gene finders, we ex- 
amined regions outside of gene predictions 
that were similar to the EST sequence, and 
where the EST matched the genomic se- 
quence across a splice junction. After correct- 
ing for potential 3' UTRs of predicted genes, 
about 2500 such regions remained. Addition 
of a requirenient for at least one of the fol-. 
. lowing evidence, types— homology to ihouse 
genomic sequence fragments, rodent ESTs, 
or cDNAs — or similarity to a known protein 
reduced this number to 1010. Adding this to 
the numbers from the previous paragraph 
would give us estimates of about 40,000, 
27,000, and 24,000 potential genes in the 
human genome, depending on the stringency 
of evidence considered. Table 8 illustrates the 
number of genes and presents the degree of 
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confidence based on the supporting evidence. 
Transcripts encoded by a set of 26,383 genes 
were assembled for further analysis. This set 
includes the 6538 genes predicted by Otto on 
the basis of matches to known genes, 1 \J226 
transcripts predicted by Otto based on homol- 
ogy evidence, and 8619 from the subset of 
transcripts from de novo gene-prediction pro- 
grams that have two types of supporting ev- 
idence. The 26,383 genes are illustrated along 
chromosome diagrams in Fig. 1. These are a 
very preliininary set of annotations and are 
subject to all the limitations of an automated 
process. Considerable refmement is still nec- 
essary to improve the accuracy of these tran- 
script predictions. All the predictions and 
descriptions of genes and the associated evi- 
dence that we present are the product of 
completely computational processes, not ex- 
pert curation. We have attempted to enumer- 
ate the genes in the himian genome in such a 
way that we have different levels of confi- 
dence based on the amount of supporting 
evidence: known genes, genes with good pro- 
tein or EST homology evidence, and de novo 
gene predictions confirmed by modest ho- 
mology evidence. 

3.4 Features of human gene 
transcripts 

We estimate the average span for a "typi- 
cal" gene in the human DNA sequence to 
be about 27,894 bases. This is based on the 
average span covered by RefSeq tran- 
scripts, used because it represents our high- 
est confidence set. 

The set of transcripts promoted to gene 
annotations varies in a number of ways. As 
can be seen from Table 8 and Fig. 9, tran- 
scripts predicted by Otto tend to be longer, 
having on average about 7.8 exons, whereas 
those promoted from gene-prediction pro- 
grams average about 3.7 exons. The largest 
number of exons that we have identified in a 
transcript is 234 in the titin mRNA. Table 8 
compares the amounts of evidence that sup- 
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port the Otto and other predicted transcripts. 
For example, one can see that a typical Otto 
transcript has 6.99 of its 7.81 exons supported 
by protein homology evidence. As would be 
expected, the Otto transcripts generally have 
more support than do transcripts predicted by 
the de novo methods. 

4 Genome Structure 

, Summary. This section describes seyeral of 
the. noncoding attributes . of the assembled; 
genome sequence and their correlations with 
the predicted gene set. These include an anal- 
ysis of G+C content and gene density in the 
context of cytogenetic maps of the genome, 
an ehumerative analysis of CpG islands, and 
a brief description of the genome-wide repet- 
itive elements. 
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4.1 Cytogenetic maps 

Perhaps the most obvious, and certainly the 
most visible, element of the structure of 
the genome is the banding pattern produced 
by Giemsa stain. Chromosomal banding 
studies have revealed that about 17% to 
20% of the human chromosome comple- 
ment consists of C-bands, or constitutive 
heterochromatin (5^). Much of this hetero- 
chromatin is highly polymorphic and con- 
sists of different ifamilies of alpha satellite 
DNAs with various higher order repeat 
structures (65). Many chromosomes have 
complex inter- and intrachromosomal du- 
plications present in pericentromeric re- 
gions (66). About 5% of the sequence reads 
were identified as alpha satellite sequences; 
these were not included in the assembly. 



H Otto (homology) 

□ Otto (RefSeq only) 

□ Genscan 



m. n 



n. n . n . 



n. n, 



0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 
Number of predictions per RefSeq transcript 

Fig. 8. Analysis of split genes resulting from different annotation methods. A set of 4512 
Sim4-based alignments of RefSeq transcripts to the genomic assembly were chosen (see the text 
for criteria), and the numbers of overlapping Genscan, Otto (RefSeq only) annotations based solely 
on Sim4-polished RefSeq alignments, and Otto (homology) annotations (annotations produced by 
supplying all available evidence to Genscan) were tallied. These data show the degree to which 
multiple Genscan predictions and/or Otto annotations were assbdated With a single RefSeq 
transcript The zero class for the Otto-homology predictions shown here indicates that the 
Otto-homology calls were made without recourse to the RefSeq transcript and thus no Otto call 
was made because of insufficient evidence. 



Jhi*!fo?: ^r**^'^ exons and transcripts supported by various types of evidence for Otto and de novo gene prediction methods. Highlighted cells indicate 
tne gene sets analyzed in this paper (boldface, set of genes selected for protein analysis; italic, total set of accepted de novo predictions) - 







Total 




■ Types of evidencie 






No. of lines of evidence* 








Mouse 


Rodent 


Protein 


Human 


SI 


2:2 


a3 


S4 


Otto 


Number of 

transcripts 
Number of 

exons 
Number of 

transcripts. 
Number of 

exons 
Otto 
De novo 


17.969 
141,218 


17,065 
111,174 


14.881 
89,569 


15.477 
108.431 


16,374 
118,869 


17.968t 
140,710 


17,501 
127,955 


15,877 
99,574 


12.451 
59,804 


De novo 


58,032 
319,935 


14.463 
48,594 


5.094 
19.344 


8.043 
26,264 


9,220 
40,104 


21,350 
79,148 


8,619 

31.130 


4,947 
17,508 


1,904 
6,520 


No. of exons per 
transcript 


7.84 
5.53 


5.77 
3.17 


6.01 
3.80 


6.99 
3.27 


724 
4.36 


7.81 
3.7 


7.19 
3.56 


6.00 
3.42 


4.28 
3.16 



«nilL^^*« l^**!!^* (conservation in 3X mouse genomic DNA. similarity to human EST or cDNA. similarity to rodent EST or cDNA. and similarity to known proteins) were 
number Includes alternative spbce forms of the 1 7.764 genes mentioned elsewhere in the text * k pi 
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Examination of pericentromeric regions is 
ongoing. 

The remaining -80% of the genome, the 
euchromatic component, is divisible into G-, 
R-, and T-bands (67). These cytogenetic bands 
have been presumed to differ in their nucleotide 
composition and gene density, although we 
have been unable to detemiine precise band 
boundaries at the molecular level. T-bands are 
the most G+C- and gene-rich, and G-bands are 
G+C-poor (68). Bemaidi has also offered a 
description of the euchromatin at the molecular 
level as long stretches of DNA of differing base 
composition, termed isochores (denoted L, HI, 
H2. and H3). which arc >300 kbp in length 
(69). Bemardi defined tiie L (light) isochores as 
G+C-poor (<43%), whereas the H (heavy) 
isochores fall into three G+C-rich classes rep- 
resenting 24, 8, and 5% of the genome. Gene 
concentration has been claimed to be very low 
. in the L isochores and 20-fold more enriched in 
the H2 and H3 isochores (70. By examining 
contiguous 50-kbp windows of G+C content . 
across the assembly, we found tfiat regions of 
G+C content >48% (H3 isochores) averaged 
273.9 kbp in length, those with G+C content 
between 43 and 48% (HI +H2 isochores) aver- 
aged 202.8 kbp in length, and the average span 
of regions with <43% (L isochores) was. 
1078.6 kbp. The correlation between G+C 
content and gene density was also examined in 
50-kbp ^yindows along the assembled sequence - . 
(Table 9 and Figs. 10 and 11). We found that 
the density of genes was greater in regions of 
high G+C than in regions of low G+C content, 
as expected However, the coirelation between 
G+C content and gene density was not as 
skewed as previously predicted (69). A higher 
proportion of genes were located in the G+C- 
poor regions than had been expected 

Chromosomes 17. 19, and 22, which have 
a disproportionate number of H3-containing 
bands, had the highest gene density (Table 
10). Conversely, of the chromosomes that we 
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found to have the lowest gene density, X, 4, 
18. 13. and Y, also have the fewest H3 bands. 
Chromosome 15, which also has few H3 
bands, did not have a particularly low gene 
density in our analysis. In addition, chromo- 
some 8. which we found to have a low gene 
density, does not appear to be imusual in its 
H3 banding. 

• How . valid is Ohno's postulate . (77) that 
mammalian genomes consist of oases of genes 
in otherwise essentially empty deserts? It ap- 
pears that the human genome does indeed con- 
tain deserts, or large, gene-poor regions. If we 
define a desert as a region >500 kbp widiout a 
gene, then we see that 605 Mbp, or about 20% 
of the . genome, is in deserts. These are not 
unifomily distributed over Ac various chromo- 
somes. Gene-rich chromosomes 17, 19. and 22 
have only about 12% of their collective 171 
Mbp in deserts, whereas gene-poor chromo- 
somes 4, 13, 18, and X have 27.5% of their 492 
Mbp in deserts (Table 1 1). The apparent lack of 
predicted genes in these regions does not nec- 
essarily imply that they are devoid of biological 
fiinctioa 

4.2 Linkage map 

Linkage maps provide the basis for genetic 
analysis and are widely used in the study of the 
inheritance of traits and in the positional clon- 
ing of genes. The distarice metric, centimorgans 
(cM), is based on the recoinbination rate be- 
tween homologous chromosomes during meio- 

Table 9. Characteristics of C+C in isochores. 



sis. In general, the rate of recombination in 
females is greater than that in males, and this 
degree of map expansion is not uniform across 
the genome (72). One of the opportunities en- 
abled by a nearly complete genome sequence is 
to produce the ultimate physical map, and to 
fully analyze its correspondence with two other 
.maps that have been widely used in genome 
and genetic analysis: the lirikage map and the 
/cytogenetic m^. This would close the loop 
between the mapping and sequencing phases of 
the genome project 

We mapped the location of the markers 
that constitute the Genethon linkage map to 
the genome. The rate of recombination, ex- 
pressed as cM per Mbp, was calculated for 
3-Mbp windows as shown in Table 12. High- 
er rates of recombination in the telomeric 
region of the chromosomes have been previ- 
ously documented (73). From this mapping 
result, there is a difference of 4.99 between 
lowest rates and highest rates and the largest 
difference of 4.4 between males and females 
(4.99 to 0.47 on chromosome 16). This indi- 
cates that the variability in recombination 
rates among regions of the genome exceeds 
the differences in recombination rates be- 
tween males and females.. The human ge- 
nome has recombination hotspots, where re- 
combination rates vary fivefold or more over 
a space of 1 kbp, so the picture orie gets of the 
magnitude of . variability in recombination 
rate will depend on the size of the window 



Isochore 



C+C {%) 



Fraction of genome 



Fraction of genes 



Predicted* 



Observed 



Predicted* 



Observed 



H3 

H1/H2 
L 



>48 
43-48 
<43 



5 
25 
67 



9.5 
21.2 
69.2 



37 
32 
31 



'The predictions were based on Bemardi's definitions (70) of the isochore structure of the human genome. 



24.8 
26.6 
48.5 



Rg. 9. Comparison of 
the number of exons 
per transcript between 
the 17,968 Otto tran- 
scripts and 21,350 de 
novo transcript predic- 
tions with at least one 
line of evidence that 
do not overlap with an 
Otto prediction. Both 
sets have the highest 
number of transcripts 
in the two-exon cate- 
gory, but the de novo 
gene predictions are 
skewed much more 
toward smaller tran- 
scripts. In the Otto set 
19.7% of the tran- 
scripts have one or 
two exons, and 5.7% 
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transcripts 
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have more than 20. In the de novo set 493% of the transcripts have one or two exons. and 02% have more than 20. 
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examined Unfortunately, too few meil^ 

crossovers have occurred in Centre d'Etude 
du Polymorphism Humain (CEPH) and other 
reference families to provide a resolution any 
finer than about 3 Mbp. The next challenge 
will be to determine a sequence basis of 
recombination at the chromosomal level. An 
accurate predictor for the rate for variation in 
recombination rates between any pair of 
markers would be extremely useful in design- 
mg markers to narrow a region of linkage, 
such as in positional cloning projects. 
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4.3 Correlation between CpC Islands 
and genes 

CpG islands are stretches of unmethylated 
DNA with a higher fi-equency of CpG 
dinucleotides when compared with the entire 
genome {74), CpG islands are believed to 
preferentially occur at the transcriptional start 
of genes, and it has been observed that most 
housekeeping genes have CpG islands at the 
5' end of the transcript (75, 76). In addition, 
experimental evidence indicates that CpG is- 
land methylation is correlated with gene in- 
activation (77) and has been shown to be 
important during gene imprinting {78) and 
tissue-specific gene expression {79) 

Experimental methods have been used 
that resulted in an estimate of 30,000 to 
45,000 CpG islands in the human genome 
{74, 80) and an estimate of 499 CpG islands 
on human chromosome 22 {81), Larsen et 
al {76) and Gardiner-Garden and Frommer 
(75) used a computational method to iden- 
tify CpG islands and defined them as re- 
gions of DNA of >200 bp that have a G+C 
content of >50% and a ratio of observed 



versus expected frequency of CG dinucle- 
otide ^0.6. 

It is difficult to make a direct compari- 
son of experimental definitions of CpG is- 
lands with computational definitions be- 
cause computational methods do not con- 
sider the methylation state of cytosine and 
experimental methods do not directly select 
regions of high G+C content. However, we 
can determine the correlation of CpG island 
with gene starts, given a set of annotated ^ 
. genomic frariscripts arid the whole genome 
sequence. We have analyzed the publicly 
' available annotation of chromosome 22, as 
well as using the entire human genome in 
our assembly and the computationally an- 
notated genes. A variation of the CpG is- 
land computation was compared with 
Larsen et al. {76). The main differences are 
that we use a sliding window of 200 bp, 
consecutive windows are merged only if 
they overiap, and we recompute the CpG 
value upon merging, thus rejecting any po- 
tential island if it scores less than the 
threshold. 

To compute various CpG statistics, we 
used two different thresholds of CG dinucle- 
otide likelihood ratio. Besides using the orig- 
inal threshold of 0.6 (method 1), we used a 
higher threshold of CG dinucleotide likeli- 
hood ratio of 0.8 (method 2), which results in 
the number of CpG islands on chromosome 
22 close to the number of annotated genes on 
this chromosome. The main results are sum- 
marized in Table 13. CpG islands computed 
with method 1 predicted only 2.6% of the 
CSA sequence as CpG, but 40% of the gene 
starts (start codons) are contained inside a 



.pG island. This is comparable to ratios re- 
ported by others {82). The last two rows of 
the table show the observed and expected 
average distance, respectively, of the closest 
CpG island from the fint exon. The observed 
average closest CpG islands are smaller than 
the conresponding expected distances, con- 
finning an association between CpG island 
and the first exon. 

We also looked at the distribution of CpG 
; island nucleotides among various sequence 
classes such as intergenic regions, introns, 
exons. and first exons. We computed the 
likelihopd score for each sequence class as 
the ratio of the observed fraction of CpG 
island nucleotides in that sequence class 
and the expected fraction of CpG island 
nucleotides in that sequence class. The re- 
sult of applying method 1 on CSA were 
scores of 0.89 for intergenic region, 1.2 for 
mtron, 5.86 for exon, and 13.2 for first 
exon. The same trend was also found for 
chromosome 22 and after the application of 
a higher threshold (method 2) on both data 
sets. In sum, genome-wide analysis has 
extended earlier analysis and suggests a 
strong correlation between CpG islands and 
first coding exons. 
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4.4 Cenome-wide repetitive elements 

The proportion of the genome covered by 
various classes of repetitive DNA is present- 
ed in Table 14. We observed about 35% of 
the genome in these repeat classes, very sim- 
ilar to values reported previously {83). Repet- 
itive sequence may be underrepresented in 
the Celera assembly as a result of incomplete 
repeat resolution, as discussed above. About 
8% of the scaffold length is in gaps, and we 
expect that much of this is repetitive se- 
quence. Chromosome 19 has the highest re- 
peat density (57%), as well as the highest 
gene density (Table 10). Of interest, among 
the different classes of repeat elements, we 
observe a clear association of Alu elements 
and gene density, which was not observed 
between LINEs and gene density. 

5 Genome Evolution 

Summary. The dynamic nature of genome 
evolution can be capnired at several levels. 
These include gene duplications mediated by 
RNA intermediates (retrotransposition) and 
segmental genomic duplications. In this sec- 
tion, we document the genome-wide occur- 
rence of retrotransposition events generating 
fiinctional (intronless paralogs) or inactive 
genes (pseudogenes). Genes involved in 
translational processes and nuclear regulation 
account for nearly 50% of all intronless para- 
logs and processed pseudogenes detected in 
our survey. We have also cataloged the extent 
of segmental genomic duplication and pro- 
vide evidence for 1077 duplicated blocks 
covering 3522 distinct genes. 
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Fig. 11 (continued). Relation among gene density (orange), G+C content dows. The percent of G+C nucleotides was calculated in 100-kbp 
(green). EST density (blue), and Alu density (pink) along the lengths of windows. The number of ESTs and Alu elements is shown per 100-kbp 
each of the chromosomes. Gene density was calculated in 1-Mbp win- window. 



5.1 Retrotranspositlon In the human a duplication event. The existence of both events in cellular biology. Identification of 
genome intron-containing and intronless forms of conserved intronless paralogs in the mouse 
Retrotranspositlon of processed mRNA genes encoding functionally similar or or other mammalian genomes should pro- 
transcripts into the genome results in func- identical proteins has been previously de- vide the basis for capturing the evolution- 
tional genes, called intronless paralogs, or scribed (84, 85). Cataloging these evolu- ary chronology of these transposition 
inactivated genes (pseudogenes). A paralog tionary events on the genomic landscape is events and provide insights into gene loss 
refers to a gene that appears in more than of value in understanding the functional and accretion in the mammalian radiation, 
one copy in a given organism as a result of consequences of such gene-duplication A set of proteins corresponding to all 901 
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Otto-predicted, single-exon genes were su^^ 
jected to BLAST analysis against the proteins 
encoded by the remaining multiexon predict- 
ed transcripts. Using homology criteria of 
70% sequence identity over 90% of the 
length, we identified 298 instances of single- 
to multi-exon correspondence. Of these 298 
sequences, 97 were represented in the Gen- 
Bank data set of experimentally validated 
full-length genes at the stringency specified 
and were verified by nianual inspection. 
, We believe, that these 97 cases may rep- 
resent intronless paralbgs (see Web table 1 on 
Science Online at www.sciencemag.org/cgi/ 
content/full/291/5507/1304/DCl) of known 
genes. Most of these are flanked by direct 
repeat sequences, although the precise nature 
of these repeats remains to be determined. All 
of the cases for which we have high confi- 
dence contain polyadenylated [poly(A)] tails 
characteristic of retrotransposition. 

Recent publications describing the phe- 
nomenon of functional intronless paralogs 
speculate that retrotransposition may serve as 
a mechanism used to escape X-chromosomal 
inactivation {84, 86), We do not find a bias 
toward X chromosome origination of these 
retrotransposed genes; rather, the results 
show a random chromosome distribution of 
both the intron-containing and corresponding 
intronless paralogs. We also have found sev- 
eral cases of retrotransposition from a single 
source chromosome to multiple target chro- 
mosomes. Interesting examples include the 
retrotransposition of a five exon-containing 
ribosomal protein L21 gene on chromosome 
13 onto chromosomes 1, 3, 4, 7, 10, and 14, 
respectively. The size of the source genes can 
also show variability. The largest example is 
the 31-exon diacylglycerol kinase zeta gene 
on chromosome 11 that has an intronless 
paralog on chromosome 13. Regardless of 
route, retrotransposition with subsequent 
gene changes in coding or noncoding regions 
that lead to different functions or expression 
patterns, represents a key route to providing 
an enhanced functional repertoire in mam- 
mals {87). 

Our preliminary set of retrotransposed in- 
tronless paralogs contains a clear overrepre- 
sentatiori of genes involved in translational 
processes (40% ribosomal proteins and 10% 
translation elongation factors) and nuclear 
regulation (HMG nonhistone proteins, 4%), 
as well as metabolic and regulatory enzymes. 
EST matches specific to a subset of intronless 
paralogs suggest expression of these intron- 
less paralogs. Differences in the upstream 
regulatory sequences between the source 
genes and their intronless paralogs could ac- 
count for differences in tissue-specific gene 
expression. Defining which, if any, of these 
processed genes are fimctionally expressed 
and translated will require further elucidation 
and experimental validation. 



The Human Genome 
5.2 Pseudogenes 

A pseudogene is a nonfunctional copy that is 
very similar to a normal gene but that has 
been altered slightly so that it is not ex- 
Table 1 1. Genome overview. * 
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sed. We developed a method for the pre- 
liminaiy analysis of processed pseudogenes 
in the human genome as a starting point in 
elucidating the ongoing evolutionary forces 



Size of the genome (including gaps) 
Size of the genome (excluding gaps) 
Longest contig 
: Longest scaffold 
Percent of A+T in the genome 
Percent of C+C in the genome 
Percent of undetemiined bases in the genome 
Most CC-rich 50 kb 
Least CC-rich 50 kb 
Percent of genome classified as repeats 
Number of annotated genes 
Percent of annotated genes with unknown function 
Number of genes (hypothetical and annotated) 
Percent of hypothetical and annotated genes with unknown function 
Cene with the most exons 
Average gene size 
Most gene-rich chromosome 
Least gene-rich chromosomes 

Total size of gene deserts (>500 kb with no annotated genes) 
Percent of base pairs spanned by genes 
Percent of base pairs spanned by exons 
Percent of base pairs spanned by introns 
Percent of base pairs in intergenic DNA 

Chromosome with highest proportion of DNA in annotated exons 
Chrprriosome with lowest proportion of DNA in annotated exons 
Longest intergenic region (between annotated + hypothetical genes) 
Rate of SNP variation 



2,91 Cbp 

2.66 Cbp 

1.99 Mbp 
. 14.4 Mbp 
•54 ' 

38 

9 

Chr, 2 (66%) 
Chr. X (25%) 
35 

26,383 
42 

39,114 
59 

Titin (234 exons) 
27 kbp 

Chr. 19 (23 genes/Mb) 
Chr. 13 (5 genes/Mb), 
Chr. Y (5 genes/Mb) 
605 Mbp 
25.5 to 37.8* 
1.1 to 1.4* 

24.4 to 36.4* 

74.5 to 63.6* 
Chr. 19 (933) 
Chr. Y (0.36) 

Chr. 13 (3,038,416 bp) 
1/1250 bp 



•In these ranges, the percentages correspond to the annotated gene set (26, 383 genes) and the hypothetical + 
annotated gene set (39,114 genes), respectively. o / jr 

Table 1^ Rate of recombination per physical distance (cM/Mb) across the genome. Cenethon maricers 
were placed on CSA-mapped assemblies, and then relative physical distances and rates were calculated 
in 3-Mb windows for each chromosome. NA, not applicable. 



Chrom. 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
X 
Y 

Genome 





Male 






Sex-average 






Female 




Max. 


Avg. 


Min. 


Max. 


Avg. 


Min. 


Max 


Avg. 


Min. 


2.60 


1.12 


0.23 


2.81 


1.42 


0.52 


339 


1.76 


0.68 


2.23 


0.78 


0.33 


2.65 


1.12 


0.54 


3.17 


1.40 


0.61 


2.55 


0.86 


0.23 


2.40 


1.07 


0.42 


2.71 


130 


033 


1.66 


0.67 


0.15 


2.06 


1.04 


0.60 


2.50 


1.40 


0.77 


2.00 


0.67 


0.18 


1.87 


1.08 


0.42 


. 226 


1.43 


0.62 


1.97 


0.71 


Q2S 


2.57 


1.12 


037 


3.47 


1.67 


0.64 


2.34 


1.16 


0.48 


1.67 


1.17 


0.47 


227 


121 


0.34 


1.83 


0.73 


0.14 


2.40 


. 1.05 


0.46 


3.44 


136 


0.43 


2.01 


0.99 


0.53 


1.95 


132 


0.77 


2.63 


'1.66 


0.82 


.3.73 


1.03 . 


022 


3.05 


1.29 


0.66 


2.84 


1.51 • 


0.76 


1.43 


0.72 


031 


2.13 


0.99 


0.47 


3.10 


132 


0.49 


4.12 


0.76 


026 


335 


1.16 


0.49 


2.93 


1.55 


0.59 


1.60 


0.75 


0.01 


1.87 


0.95 


0.17 


2.49 


1.19 


032 


3.15 


0.98 


0.18 


2.65 


130 


0.62 


3.14 


1.63 


0.75 


2.28 


0.94 


034 


231 


122 


0.42 


2.53 


1.56 


0.54 


1.83 


1.00 


0.47 


2.70 


1.55 


0.63 


4.99 . 


232 


1.12 


3.87 


0.87 


0.00 


3.54 


135 


034 


4.19 


1.83 


0.94 


3.12 


1.37 


0.86 


3.75 


1.66 


0.43 


435 


224 


0.72 


3.02 


0.97 


0.10 


2.57 


1.41 


0.49 


2.89 


1.75 


0.87 


3.64 


0.89 


0.00 


2.79 


1.50 


0.83 


331 


2.15 


134 


3^3 


126 


0.69 


237 


1.62 


1.08 


2.58 


150 


1.18 


1.25 


1.10 


0.84 


1.88 


1.41 


1.08 


3.73 


2.08 


0.93 


NA 


NA 


NA 


NA 


NA 


NA 


3.12 


1.64 


0.72 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


4.12 


0.88 


0.00 


3.75 


122 


0.17 


4.99 


1.55 


032 
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that account for gene inactivation. The gen- 
eral structural characteristics of these pro- 
cessed pseudogenes include the complete 
lack of intervening sequences found in the. 
functional coimteiparts, a poly(A) tract at the 
3' end, and direct repeats flanJcing the pseu- 
dogene sequence. Processed pseudogenes oc- 
cur as a result of retrotransposition, whereas 
unprocessed pseudogenes arise from segmen- 
tal jgenome duplication. 

We searched the complete set of Otto- 
predicted transcripts against the. genomic se- 
. . quence by means of BLAST. Genomic re- 
gions corresponding to all Otto-predicted 
transcripts >yere excluded from this analysis. 
We identified 2909 regions matching with . 
greater than 70% identity over at least 70% of 
the length of the transcripts that likely repre- 
sent processed pseudogenes. This mmiber is 
probably an underestimate because specific 
methods to search for pseudogenes were not 
used. 

We looked for correlations between 
structural elements and the propensity for 
retrotransposition in the human genome. 
GC content and transcript length were com- 
pared between the genes with processed 
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pseudogenes (1177 source genes) versus 
the remainder of the predicted gene set. 
Transcripts that give rise to processed pseu- 
dogenes have shorter average transcript 
length (1027 bp versus 1594 bp for the Otto 
set) as compared with genes for which no 
pseudogene was detected. The overall GC 
content did not show any significant differ- 
ence, contrary to a recent report (88). There 
is a .clear trend in. gene families that are 
present as processed pseudogenes. These 
include ribosomaf proteins (67%), lamin 
receptors (10%), translation elongation fac- 
tor alpha (5%), and HMG-non-histone pro- 
teins (2%). The increased occurrence of 
- retrotransposition (both intronless paralogs 
and processed pseudogenes) among genes 
involved in translation and nuclear regula- 
tion may reflect an increased transcription- . 
al activity of these genes. 

5.3 Gene duplication In the human 
genome 

Building on a previously published procedure 
(27), we developed a graph-theoretic algo- 
rithm, called Lek, for grouping the predicted 
human protein set into protein families (89), 



Table 13. Characteristics of CpC islands identified in chromosome 22 (34-Mbp sequence length) and the 
whole genome (2.9-Cbp sequence length) by means of two different methods. Method 1 uses a CC 
likelihood ratio of &0.6. Method 2 uses a CC likelihood ratio of SO.8. . 



Chromosome 22 



Whole genome 
(CS assembly) 



Number of CpC islands 

detected 
Average length of island (bp) 
Percent of sequence 

predicted as CpC 
Percent of first exons that 

overlap a CpC island 
Percent of first exons with 

first position of exon 

contained inside a CpC 

island 



Method 1 


Method 2 


Method 1 


Method 2 


5.211 


522 


195.706 


26.876 


390 


535 


395 


497 


5.9 


0.8 


2.6 


0.4 


44 


25 


42 


22 


37 


22 


40 


21 



. Average distance between 

first exon and closest CpC 

island (bp) . 
Expected distance between 

first exon and closest CpC 

island (bp) 


1.013 
3.262 


10.486 
32.567 


2.182 
7.164 


17,021 
55.811 


Distribution of repetitive DNA in the compartmentalized shotgun assembly sequence.^^ 


Repetitive elements 




Mejgabases in \ 
assembled 
sequences 


Percent 
of 

assembly 


Previously 
predicted 
(%) (83) 


Atu 

Mammalian interspersed repeat (MIR) 

Medium reiteration (MER) 

Long tenminal repeat (LTR) 

Long interspersed nucleotide element 

(UNE) 
Total 




288 
66 
50 
155 
466 


9.9 
23 
1.7 
53 
16.1 


10.0 
1.7 
1.6 
5.6 

16.7 




1025 


353 


35.6 



The complete clusters that result from the 
Lek clustering provide one basis for compar- 
ing the role of whole-genome or chromosom- 
al duplication in protein family expansion as 
opposed to other means, such as fandem du- 
plication. Because each complete cluster rep- 
resents a closed and certain island of homol- 
^ ogy, and because Lek is capable of simulta- 
neously clustering protein complements of 
several organisms, the number of proteins 
contributed by each organism to a complete 
cluster can be predicted with confidence de- 
pending on the quality of the annotation of 
each genome. The variance of each organ- 
ism's contribution to each cluster can then be 
calculated, allowing an assessment of the rel- 
ative . importance of large-scale duplication 
versus smaller-scale, organism-specific ex- 
pansion and contraction of protein families, 
presumably as a result of natural selection 
operating on individual protein families with- 
in an organism. As can be seen in Fig. 12, the 
large variance in the relative numbers of hu- 
man as compared with D. melanogaster and 
Caenorhabditis elegans proteins in complete 
clusters may be explained by multiple events 
of relative expansions in gene families in 
each of the three animal genomes. Such ex- 
pansions would give rise to the distribution 
that shows a peak at 1 : 1 in the ratio for 
human-worm or human-fly clusters with the 
slope spread covering both human and fly/ 
worm predominance, as we observed (Fig. 
12). Furthennore, there are nearly as many 
clusters where worm and fly proteins pre- 
dominate despite the larger numbers of pro- 
teins in the human. At face value, this anal- 
ysis suggests that natural selection acting on 
individual protein families has been a major 
force driving the expansion of at least some 
elements of the himian protein set. However, 
in our analysis, the difference between an 
ancient whole-genome duplication followed 
by loss, versus piecemeal duplication, cannot 
be easily distinguished. In order to differen- 
tiate these scenarios, more extended analyses 
were performed. 

5.4 Large-scale duplications 

Using two independent methods, we 
searched for large-scale duplications in the 
human genome. First, we describe a protein 
family-based method that identified highly 
conserved blocks of duplication. We then 
: describe our comprehensive method for identi- 
fying all interchromosomal block dxqslications. 
The latter method identified a large number of 
duplicated chromosomal segments covering 
parts of all 24 chromosomes. 

The first of the methods is based on the 
idea of searching for blocks of highly con- 
served homologous proteins that occur in 
more than one location on the genome. For 
this comparison, two genes were considered 
equivalent if their protein products were de- 
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tennined to be in the same family and 
same complete Lek cluster (essentially 
paialogous genes) (89). Initially, each chrcv 
mosome was represented as a string of genes 
ordered by the start codons for predicted 
genes along the chromosome. We considered 
the two strands as a single string, because 
local inversions are relatively common events 
relative to large-scale duplications. Each 
gene was indexed according to the protein 
family and Lek complete cluster (89). All 
pairs of . indexed gene strings . were then 
aligned in both the forward and reverse di- 
rections with the Smith- Waterman algorithm 
(90), A match between two proteins of the 
same Lek complete cluster was given a score 
of 10 and a mismatch -10, with gap open 
and extend penalties of -4 and -1. With 
these parameters, 19 conserved interchromo- 
somal blocks of duplication were observed, 
all of which were also detected and expanded 
by the comprehensive method described be- 
low. The detection of only a relatively small 
number of block duplications was a conse- 
quence of using an intrinsically conservative 
method grounded in the conservative con- 
straints of the complete Lek clusters. 

In the second, more comprehensive ap- 
proach, we aligned all chromosomes directly 
with one another using an algorithm based on 
the MUMmer system (P7). This alignment 
method uses a suffix tree data structure and a 
linear-time algorithm to align long sequences 
very rapidly; for example, two chromosomes 
of 100 Mbp can be aligned in less than 20 
min (on a Compaq Alpha computer) with 4 
gigabytes of memory. This procedure was 
used recently to identify numerous large- 
scale segmental duplications among the five 
chromosomes of A. thaliana (92); in that 
organism, the method revealed that 60% of 
the genome (66 Mbp) is covered by 24 very 
large duplicated segments. For Arabidopsis, a 
DNA-based alignment was sufficient to re- 
veal the segmental duplications between 
chromosomes; in the human genome, DNA 
alignments at the whole-chromosome level 
are insufficiently sensitive. Therefore, a mod- 
ified procedure was developed and applied, 
as follows. First, all 26,588 proteins 
(9,675,713 million amino acids) were concat- 
enated end-to-end in order as they occur 
along each of the 24 chromosomes, irrespec- 
tive of strand location. The concatenated pro- 
tein set was then aligned against each chro- 
mosome by the MUMmer algorithm. The 
resulting matches were clustered to extract all 
sets of three or more protein matches that 
occur in close proximity on two different 
chromosomes (93); these represent the can- 
didate segmental duplications. A series of 
filters were developed and applied to remove 
likely false-positives from this set; for exam- 
ple, small blocks that were spread across 
many proteins were removed. To refine the 
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filtering methods, a shuffled protein set was 
fu^t created by taking the 26,588 proteins, 
randomizing their order, and then partitioning 
them into 24 shuffled chromosomes, each 
containing the same number of proteins as the 
true genome. This shuffled protein set has the 
identical composition to the real genome; in 
particular, every protein and every domain 
appears the same number of times. The com- 
plete algorithm was then applied to both the 
* real and the shuffled data, with the results on 
the shuffled data, being used, to estimate the 
false-positive rate. Jhe algorithm after filter- 
ing yielded 10,310 gene pairs in 1077 dupli- . 
cated blocks containing 3522 distinct genes; 
tandemly duplicated expansions in many of 
the blocks explain the excess of gene pairs to 
distinct genes. In the shuffled data, by con- 
trast, only 370 gene pairs were found, giving 
a false-positive estimate of 3.6%. The most 
likely explanation for the 1077 block dupli- 
cations is ancient segmental duplications. In 
many cases, the order of the proteins has been 
shuffled, although proximity is preserved. 
Out of the .1077 blocks, 159 contain only 
three genes, 137 contain four genes, and 781 
contain five or more genes. 

To illustrate the extent of the detected 
duplications. Fig. 13 shows all 1077 block 
duplications indexed to each chromosome in 
24 panels in which only duplications mapped 
to the indexed chromosome are displayed. 
The figure makes it clear that the duplications 
are ubiquitous in the genome. One feature 
that it displays is many relatively small chro- 
mosomal stretches, with one-to-many dupli- 
cation relationships that are graphically strik- 
ing. One such example captured by the anal- 
ysis is the well-documented olfactory recep- 
tor (OR) family, which is scattered in blocks 
throughout the genome and which has been 
analyzed for genome-deployment reconstruc- 

700 



600 
o 500 

(0 

O 400 
fe.300 

Si 

J 200 
100 



figure 



at several evolutionary stages (94). The 
igure also illustrates that some chromo- 
somes, such as chromosome 2, contain many 
more detected large-scale duplications than 
others. Indeed, one of the largest duplicated 
segments is a large block of 33- proteins on 
chromosome 2, spread among eight smaller 
blocks in 2p, that aligns to a paralogous set on 
chromosome 14, with one rearrangement (see 
chromosomes 2 and 14 panels in Fig. 13). 
. The proteins are not contiguous but span a 
riegion containing 97 proteins on chromo- 
some 2 and 332 proteins on chromosome 14. 
The likelihood of observing this many dupli- 
cated proteins by chance, even over a span of 
this length, is 2.3 X lO'^^ (93). This dupli- 
cated set spans 20 Mbp on chromosome 2 and 
63 Mbp on chromosome 14, over 70% of the 
latter chromosome. Chromosome 2 also con- 
tains a block duplication that is nearly as 
large, which is shared by chromosome arm 2q 
and chromosome 12. This duplication incor- 
porates two of the four known Hox gene 
clusters, but considerably expands the extent 
of the duplications proximally and distally on 
the pair of chromosome arms. This breadth of 
. duplication is also seen on the two chromo- 
somes carrying the other two Hox clusters. 

An additional large duplication, between 
chromosomes 18 and 20, serves as a good 
example to . illustrate some of the features 
conmion to many of the other observed large 
duplications (Fig. 13, inset). This duplication 
contains 64 detected ordered intrachromo- 
somal pairs of homologous genes. After dis- 
counting a 40-Mb stretch of chromosome 18 
free of matches to chromosome 20, which is 
likely to represent a large insert (between the 
gene assignments "Krup rel" and "collagen 
rel" on chromosome 18 in Fig. 13), the full 
duplication segment covers 36 Mb on chro- 
mosome 18 and 28 Mb on chromosome 20. 




Human/Worm 
Human/Fly 



5:1 4:1 3:1 
human predominant 



Ratio 



1:3 1:4 1:5 
flyMonv predominant 



Fig. 12. Gene duplication in complete protein clusters. The predicted protein sets of human, worm, 
and fly were subjected to Lek clustering (27). The numbers of dusters with varying ratios (whole 
number) of human versus worm and human versus fly proteins per duster were plotted. 
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By this measure, the duplication segment 
spans nearly half of each chromosome's net 
length. The most likely scenario is that the 
whole span of this region was duplicated as a 
single very large block, followed by shuffling 
owing to smaller scale rearrangements. As 
such, at least four subsequent rearrangements 
would need to be invoked to explain the,, 
relative insertions and inversions seen in the 
. duplicated segment interval. The 64 protein ; 
pairs in this alignment occur among 217 pro- 
tein assignments on chromosome 18. and 
among 322 protein assignments on chromo- 
some 20, for a density of involved proteins of . . mosomal region. The corresponding mouse. 
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pair of duplicated chromosome regions was 
observed in many compared regions. Hypothe- 
ses to explain uiiich mechanisms foster these 
processes must be tested. 

Evaluation of the ahgnment results gives 
some perspective on dating of the duplications. ' 
JiS noted above, large-scale ancient segmental 
. duplication in fact best explains many of the 
blocks detected by this genome-wide analysis. 
The. regions of human chromosomes involved • 
in the large-scale dupHcations expanded upon 
above (chromosomes 2 to 14, 2 to 12, and 18 to 
20) are each syntonic to a distinct mouse chro- 



20 to 30%.-^This is consistent with an ancient 
large-scale duplication followed by subse- 
quent gene loss on one or both chromosomes. 
Loss of just one member of a gene pair 
subsequent to the duplication would result in 
a failure to score a gene pair in the block; less 
than 50% gene loss on the chromosomes 
would, lead to the duplication density ob- 
served here. As an independent verification 
of the significance of the aligrmients detect- 
ed, it can be seen that a substantial number of 
the pairs of aligning proteins in this duplica- 
tion, including some of those annotated (Fig. 
13), are those populating small Lek complete 
clusters (see above). This indicates that they 
are members of very small families of para- 
logs; their relative scarcity within the genome 
validates the tmiqiieness and robiist nature of 
their alignments. 

Two additional qualitative features were ob- 
served among many of the large-scale duplica- 
tions. First, several proteins with disease asso- 
ciations, with OMIM (Online Mendelian Inher- 
itance in Man) assignments, are members of 
duplicated segments (see web table 2 on Sci- 
ence Online at www.sciencemag.org/cgi/con- 
tent/fiill/291/5507/1304/DCl). We have also 
observed a few instances where paralogs on 
both duphcated segments are associated with 
similar disease conditions. Notable among 
these genes are proteins involved in hemostasis 
(coagulation factors) that are associated with 
bleeding disorders, transcriptional regulators 
like the homeobox proteins associated with de- 
velopmental disorders, and potassium channels 
associated with cardiovascular conduction ab- 
normalities. For each of these disease genes, 
closer study of the paralogous genes in the 
duplicated segment may reveal new insights 
into disease causation, with further investiga- 
tion needed to determine whether they might be 
involved in the same or similar genetic diseases. 
Second, although there is a conserved number 
of proteins and coding exons predicted for spe- 
cific large di^licated spans within the chromo- 
some 18 to 20 alignment, the genomic DNA of 
chromosome 18 in these specific spans is in 
some cases more than 10-fold longer than the 
corresponding chromosome 20 DNA This se- 
lective accretion of noncoding DNA (or con- 
ver^Iy, loss of noncoding DNA) on one of a 



chromosomal regions are much more similar in 
. sequence conservation, and even in order, to 
their human synteny partners than the human 
di^lication regions are to each other. Further, 
the corresponding mouse chromosomal regions 
each bear a significant proportion of genes or- 
thologous to the human genes on which the 
human duplication assignments were made. On 
the basis, of these fectors. the conresponding 
. mouse chromosomal spans, at coarse resolu- 
tion, appear to be products of the same large- 
scale duplications observed in humans. Al- 
though further detailed analysis must be carried 
out once a more con^lete genome is assembled 
for mouse, the underlying large duplications 
appear to predate the two species' divergence. 
This dates the duplications, at the latesl; before • 
divergence of the primate and rodent lineages. 
This date can be further refined upon examina- 
tion of the synteny between human chromo- 
somes and those of chicken, pufferfish (Fugu 
rubripes), or zebrafish {95), The only sub- 
stantial syntenic stretches mapped in these 
species corresponding to both pairs of human 
duplications are restricted to the Hox cluster 
regions. When the synteny of these regions 
(or others) to human chromosomes is extend- 
ed with further mapping, the ages of the 
nearly chromosome-length duplications seen 
in humans are likely to be dated to the root of 
vertebrate divergence. 

The MUMmer-based results demonstrate 
large block duplications that range in size fi^m 
a few genes to segments covering most of a 
chromosome. The extent of segmental duplica- 
tions raises the question of whether an ancient 
whole-genome duplication event is the under- 
lying explanation for the numerous diq)licated 
regions {96), The duplications have undergone 
many deletions and subsequent rearrangements; 
these events make it difficult to distinguish 
between a whole-genome duplication and mul- 
tiple smaller events. Further analysis, focused 
especially on comparing the estimated ages of 
all the block duplications, derived partially 
fiom interspecies genome comparisons, will be 
necessary to deteraaine which of these two hy- 
potheses is more likely. Comparisons of ge- 
nomes of different vertebrates, and even cross- 
phyla genome comparisons, will allow for the 
deconvolution of duplications to eventually re- 



veal the stagewise history of our genome and 
with it a history of the emergence of majlyrf 
the key fimctions that distinguish us from mha 
living things. 

6 A Genome-Wide Examination of 
Sequence Variations 

Summary. Computational methods were used 
to identify single-nucleotide. polymorphism* 
(SNPs) by comparison of the Celera sequence 
to other SNP resources. The SNP rale be- 
tween two chromosomes was --1 per 1200 to 
1500 bp. SNPs are distributed nonrandomly 
throughout the genome. Only a very small 
- proportion of all SNPs (<1%) potentially 
impact protein function based on the func- 
tional analysis of SNPs that affect the pre- 
dicted coding regions. This results in an cs- 
timate that only thousands, not millions, of 
genetic variations may contribute to the struc- 
tural diversity of human proteins. 

. Having a complete genome sequence cnablci 
•researchers to achieve a dramatic accclcnition 
in the rate of gene discovery, but only through 
. analysis of sequence variation in DNA can wc 
discover the genetic basis for variation in hcalili 
among human beings. Whole-genome shotgun 
sequencing is a particularly effective method 
- for detecting sequence variation in tandem with 
whole-genome assembly. In addition, we com- 
pared the distribution and attributes of SNPs 
ascertained by three other methods: (i) align- 
ment of the Celera consensus sequence M ihc 
PFP assembly, (ii) overlap of high-quality rc:ids 
of genomic sequence (referred to as "Kwok"; 
1,120,195 SNPs) (P7), and (iii) reduced repre- 
sentation shotgun sequencing (referred to as 
"TSC\ 632,640 SNPs) {98). These data wen: 
consistent in showing an overall nucleotide di- 
versity of -8 X 10"^ marked heterogeneity 
across the genome in SNP density, and an 
overwhelming preponderance of noncoding 
variatiori that produces no change in expressed 
proteins. 



6.1 SNPs found by aligning the Celera 
consensus to the PFP assembly 

Ideally, methods of SNP discovery make full 
use of sequence depth and quality at every siic, 
and quantitatively control the rate of false-pos- 
itive and false-negative calls with an explicit 
sampling model (PP). Comparison of consen-ius 
sequences in the absence of these details neces- 
sitated a more ad hoc approach (quality scores 
could not readily be obtained for the PFP as- 
sembly). First, all sequence differences between 
the two consensus sequences were identified; 
these were then filtered to reduce the contribu- 
tion of sequencing errors and misassembly. As 
a measure of the effectiveness of the filtering 
step, we monitored the ratio of transition and 
transversion substitutions, because a 2:1 ratio 
has been well documented as typical in mam- 
malian evolution {100) and in human SNPs 
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SNPs derived from the Celera genome se- 
quences (46), .SNP validation in population 
samples is an expensive and laborious process. 



movmg variants where the quality score in the 
Celera consensus was less than 30 and where 
the density of variants was greater than 5 in 400 
bp. These filters resulted in shifting the transi- 
tion-to-transversion ratio from 1.57:1 to 
1.89: 1. When applied to 2.3 Gbp of aHgrunents 
between the Celera and PFP consensus se- 
quences, these filters resulted in identification 
of 2,104,820 putative SNPs from a total of 
2,778,474 substitution differences. Overiaps 
between this set of SNPs and those found by 
other methods are described below. 



so confirmation on multiple data sets may pro- 
vide an efficient initial validation "in silico" (by 
computational analysis). 

One means of assessing whether the 
three sets of SNPs provide the same picture 
of human variation is to tally the freauen- 

ripc tj,o Ul t_ , 'fcquen uianwu ur more aisrmct. alleles are present 



s^^hese data are not readily available, so 
we could not estimate nucleotide diversity 
from the TSC effort. Estimation of nucleo- 
tide diversity from high-quality sequence 
overlaps should be possible, but again, 
more information is needed on the details 
of all the alignments. 

Estimation of nucleotide diversity from a 
shotgun assembly entails calculating for each 
column of the multialignment, the probability 
that two or more distinct. alleles are present 



6.2 Compansons to public SNP 
databases 

Additional SNPs, including 2,536,021 from 
dbSNP (www.ncbi.nlm.nih.gov/SNP) and 
13.150 from HGMD (Human Gene Muta- 
tion Database, from the University of 
Wales, UK), were mapped on the Celera con- 
sensus sequence by a sequence similarity 
search with the program PowerBlast (J03y The 
two largest data sets in dbSNP are the Kwok 
and TSC sets, with 47% and 25% of the dbSNP 
records. Low-quality alignments with partial 
coverage of the dbSNP sequence and align- 
ments that had less than 98% sequence identity 
between the Celera sequence and the dbSNP 
flanking sequence were eliminated. dbSNP se- 
quences mapping to multiple locations on the 
Celera genome were discarded. A total of 
2,336,935 dbSNP variants were mapped to 
1,223,038 unique locations on the Celera se- 
quence, implying considerable redundancy in 
dbSNP. SNPs in the TSC set mapped to 
585,81 1 unique genomic locations, and SNPs in 
the Kwok set mapped to 438,032 unique loca- 
tions. The combined unique SNPs counts used 
m this analysis, including Celera-PFP TSC 
and Kwok, is 2,737,668. Table 15 show^ that a 
substantial fraction of SNPs identified by one of 
these methods was also found by another meth- 
od The very high overlap (36.2%) between the 
Kwok and Celera-PFP SNPs may be due in part 
to the use by Kwok of sequences that went into 
the PFP assembly. The unusuaUy low overlap 
(16.4%) between the Kwok and TSC sets is due 

InoI !^^"1'Pu.'*^ ^^'^ genome-wide 
SNP databases. Table entries are SNP counts for 
each pair of data sets. Numbers in parentheses are 
the fraction of overlap, calculated as the count of 
overtapping SNPs divided by the number of SNPs 
T.^^^fl^^"'' databases compared. 

Ic^ J^^ ^'^""'^ databases are: Celera- 

PFP. 2.104.820; TSC. 585.811; and Kwok 438.032 
Only unique SNPs In the TSC and Kwok data sets 
were included. 



each set of SNPs (Table 16). Previous mea- 
sures of nucleotide diversity were mostly 
derived from small-scale analysis on can- 
didate genes (lOI), and our analysis with 
all three data sets validates the previous 
observations at the whole-genome scale. 
There is remarkable homogeneity between 
the SNPs found in the Kwok set, the TSC 
set. and in our whole-genome shotgun (46) 
in this substitution pattern. Compared with 
the rest of the data sets, Celera-PFP devi- 
ates slightly from the 2:1 transition-to- 
transversion ratio observed in the other 
SNP sets. This result is not unexpected, 
because some fraction of the computation- 
ally identified SNPs in the Celera-PFP 
comparison may in fact be sequence errors. 
A 2:1 transition: trans version ratio for the 
bona fide SNPs would be obtained if one 
assumed that 15% of the sequence differ- 
ences in the Celera-PFP set were a result of 
(presumably random) sequence errors. 

6.3 Estimation of nucleotide diversity 
from ascertained SNPs 

The number of SNPs identified varied 
widely across chromosomes. In order to 
normalize these values to the chromosome 
size and sequence coverage, we used ir, the 
standard statistic for nucleotide diversity 
(104), Nucleotide diversity is a measure of 
per-site heterozygosity, quantifying the 
probability that a pair of chromosomes 
drawn from the population will differ at a 
nucleotide site. In order to calculate nucle- 
otide diversity for each chromosome, we 
need to know the number of nucleotide 
sites that were surveyed for variation, and 
in methods like reduced respresentation se- 
quencing, we need to know the sequence 
quality and the depth of coverage at each 



fact the alleles have different, sequence (i.e., 
the probability of conrect sequence calls). The 
greater the depth of coverage and the higher 
the sequence quality, the higher is the chance 
of successfully detecting a SNP (J 05). Even 
after correcting for variation in coverage, the 
nucleotide diversity appeared to vary across 
autosomes. The significance of this heteroge- 
neity was tested by analysis of variance, with 
estimates of it for 100-kbp windows to esti- 
mate variability within chromosomes (for the 
Celera-PFP comparison, F = 2913 P < 
0.0001). ' ' 

Average diversity for the autosomes es- 
timated from the Celera-PFP comparison 
was 8.94 X IO-^ Nucleotide diversity on 
the X chromosome was 6.54 X 10~^ The 
X is expected to be less variable than au- 
tosomes, because for every four copies of 
autosomes in the population, there are only 
three X chromoisomes, and this smaller ef- 
fective population size means that random 
. drift will more rapidly remove variation 
from tiie X (106), 

Having ascertained nucleotide variation 
genome-wide, it appears that previous esti- 
mates of nucleotide diversity in humans 
based on samples of genes were reasonably 
accurate (101, 102, 106, 107), Genome-wide, 
our estimate of nucleotide diversity was 
8.98 X 10-^ for the Celera-PFP alignment, 
and a published estimate averaged over 10 
densely resequenced human genes was 
8.00 X 10-^ (IQS), 



6.4 Variation In nucleotide diversity 
across the human genome 

Such an apparently high degree of variabil- 
ity among chromosomes . in SNP. density 
raises the question of whether there is het- 
erogeneity at a fmer scale within chromo- 
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Table 16. Summary of nucleotide changes in different SNP data 


sets. 
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Rg. 13. Segmental duplica- 
tions bebwen chromo- 
somes in the human ge- 
nome The 24 panels show 
the 1077 duplicated blocks 
of genes; containing 10310 
pairs of genes in total Each 
One represents a pair of ho- 
nwlogous genes belctf^ng 
to a block; all blocks con- 
tain at least three genes 
on each of the chromo- 
somes where they appear. 
Each panel shows all the 
- dupUcations between a 
sm^e chromosome and 
other chromosomes with 
shared blocks. The chro- 
mosome at the center of 
each panel is shown as a 
thick red Une for emphasis. 
Other chromosomes are 
displayed from top to bot- 
. tom within each panel or- 
dered by chromosome 
number. The inset (bot- 
tom, center right) shoyvs a 
dose-up of one duplica- 
tion between chromo- 
somes 18 and 20. expand- 
ed to display the gene 
names of 12 of the 64 
gene pairs shown. 
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somes, and whether this heterogeneity is 
greater than expected by chance. If SNPs 
occur by random and independent mutations, 
then it would seem that there ought to be a 
Poisson distribution of numbers of SNPs in 
fragments of aibitraiy constant size. The ob- 
served dispersion in the distribution of SNPs 
in 100-kbp fragments was far greater than 
predicted from a Poisson distribution (Fig. 
14). However, this simplistic model ignores 
the different recombination rates and popula- 
tion histories that exist in different regions of 
. the genome. Population genetics theory holds 
that we can account for this variation with a 
mathematical formulation called the neutral 
coalescent (109): Applying well-tested algo- 
rithms for simulating the neutral coalescent 
with recombination (J JO), and using an ef- 
fective population size of 10,000 and a per- , 
base recombination late equal to the mutation 
rate (777), we generated a distribution of num- 
bers of SNPs by this model as well (772). The 
observed disdibution of SNPs has a much larg- ^ 
er variance than either tiie Poisson model or the 
coalescent model, and the difference is highly 
significant This in^lies that there is significant 
variability across the genome in SNP density, 
an observation that begs an explanatioiL 

Several attributes of the DNA sequence 
may affect the local density of SNPs, in- 
cluding the rate at which DNA. polymerase : 
makies errors and the efficacy of mismatch 
repair. One key factor that is likely to be 
associated with SNP density is the G+C 
content, in part because methylated cy- 
tosines in CpG dinucleotides tend to under- 
go deamination to form thymine, account- 
ing for a nearly 10-fold increase in the 
mutation rate of CpGs over other dinucle- 



otideis. We tallied the GC content and nu- 
cleotide diversities in 100-kbp windows 
across the entire genome and found that the 
correlation between them was positive (r = 
0.21) and highly significant (P < 0.0001), 
.but G+C content accoimted for only a 
small part of the variation. 

.6.5 SNPs by genomic class 

.-To: test homogeneity of SNP -densities 
across functional classes, we partitioned 
sites into intergenic (defined as >5 kbp 
from any predicted transcription unit), 5'- 
UTR, exonic (missense and silent), in- 
tronic,. and 3'-UTR for 10.239 known 
genes, .derived from the NCBI.RefSeq da- ■■ 
tabase and all human genes predicted from 
.the Celera Otto annotation. In coding re- 
gions, SNPs were categorized as either si- 
lent, for those that do not change amino 
acid sequence, or missense, for those that 
change the protein product. The ratio of 
, missense to silent coding SNPs in Celera- 
PFP, TSC, and Kwok sets (1.12, 0.91, and 
0.78, respectively) shows a markedly re- 
duced frequency of missense variants com- 
pared with the neutral expectation, consis- 
tent with the elimination by natural selec- 
tion of a fraction of the deleterious amino 
acid changes (772). These ratios are com- 
parable to the missense-to^silent ratios of . 
0.88 and L17 found by Cargill et al {101) 
and by Halushka et al. (102). Similar re- 
sults were observed in SNPs derived from 
Celera shotgxm sequences (46). 

It is striking how small is the fraction of 
SNPs that lead to potentially dysfunctional 
alterations in proteins. In the 10,239 Ref- 
Seq genes, missense SNPs were only about 




Number of SNPs / 100 kb 

Fig. 14. SNP density in each 100-kbp Interval as determined with Celera-PFP SNPs. The color codes 
are as follows: black, Celera-PFP SNP density; blue, coalescent modet and red, Poisson distribution. 
The figure shows that the distribution of SNPs along the genome is nonrandom and is not entirely 
accounted for by a coalescent model of regional history. 



0.12, 0.14, and 0.17% of the total SNP 
counts in Celera-PFP, TSC, and Kwok 
SNPs, respectively. Nonconservative pro- 
tein changes constitute an even smaller frac* 
tion of missense SNPs (47, 41, anH 40% in 

: Celera-PFP. Kwok, and TSQ. Intergenic re- 
gions have been virtually unstudied (113), and 
we note tiiat 75% of the SNPs we identified 

. were intergenic (Table 17). The SNP rate was 

, . highest in introns and lowest in exons. The SNP 
rate was lower in intergenic regions than in 
introns, providing one of the first discriminators 
between these two classes of DNA, These SNP 
rates were confirmed in the Celera SNPs, which 
also exhibited a lower rate in exons than in 
introns, and in extragenic regions than in in- 
trons (46). Many of these intergenic SNPs will 

- provide valuable infonnation in the fomi of 
maricers for linkage and association studies, and 
some fraction is likely to have a regulatory 
fimction as well. 

7 An Overview of the Predicted 
Protein-Coding Genes in the Human 
Genome 

Summary. This section provides an initial 
computational analysis of the predicted 
protein set with the aim of cataloging . 
prominent differences and sirnilarities 
when the himian genome is compared with 
other fully- sequenced eukaryotic genomes. 
Over 40% of the predicted protein set in 
hiunans cannot, be ascribed a molecular 
function by methods that assign proteins to 
known families. A protein domain-based 
analysis provides a detailed catalog of the 
prominent differences in the human ge- 
nome when compared with the fly and 
. worm genomes. Prominent among these are 
domain expansions in proteins involved in 
developmental regulation and in cellular 
processes such as neuronal function, hemo- 
stasis, acquired immune response, and cy- 
toskeletal complexity. The final enimiera- 
tion of protein families and details of pro- 
tein structure will rely on additional exper- 
imental work and comprehensive manual 
curation. 

A preliminary analysis of the predicted hu- 
man protein-coding genes was conducted. 
Two methods were used to analyze and clas- 
sify the molecular functions of 26,588 pre- 
dicted proteins that represent 26,383 gene 
. predictions with at least two lines of evidence 
as described above. The first method was 
based on an analysis at the level of protein 
families, with both the publicly available 
Pfam database (114, 115) and Celera's Pan- 
ther Classification (CPC) (Fig. 15) (116). 
The second method was based on an analysis 
at the level of protein domains, with both the 
Pfam and SMART databases (775, 777). 

The results presented here are prelimi- 
nary and are subject to several limitations. 
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Both the gene predictions and functional 
assignments have been made by using com- 
putational tools, although the statistical 
models in Panther, Pfam, and SMART have 
been built, annotated, and reviewed by ex- 
pert biologists. In the set of computationally 
predicted genes, we expect both false-positive 
predictions (some of these may in feet be inac- 
tive pseudogenes) and felse-negative predic- 
tions (some human genes will not be computa- 
tionally predicted). We also expect errors in 
delimiting the boundaries of exons and genes. 
Similarly, in the automatic functional assign- 
ments, we also expect both false-positive and 
felse-negative predictions. The functional as- 
signment protocol focuses on protein families 
that tend to be found across several organisms, 
or on families of known human genes. There- 
fore, we do not assign a function to many genes 
that are not in large families, even if the func- 
tion is knowiL Unless otherwise specified, all 
enumeration of the genes in any given family or 
functional category was taken from the set of. 
26,588 predicted proteins, which were assigned 
functions by using statistical score cutoffs de- 
fined for models in Panther, Pfam, and 
SMART. 

For this initial examination of the pre- 
dicted human protein set, three broad ques- 
tions were asked: (i) What are the likely 
molecular functions of the predicted gene 
products, and how are these proteins cate- 
gorized with current classification meth- 
ods? (ii) What are the core functions that 
appear to be common across the animals? 
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(iii) How does the human protein comple- 
ment differ from that of other sequenced 
eukaryotes? 

7.1 Molecular functions of predicted 
human proteins 

Figure 15 shows an overview of the puta- 
tive molecular functions of the predicted 
26,588 human proteins that have at least 
two lines of supporting evidence. About 
41% (12,809) of . the gene products could 
not be classified from this initial analysis 
and are termed proteins with unknown 
functions. Because our automatic classifi- 
cation methods treat only relatively large 
protein families, there are a number of 
"unclassified" sequences that do, in fact, 
have a known or predicted function. For the 
60% of the protein set that have automatic 
functional predictions, the specific protein 
functions have been placed into broad 
classes. We focus here on molecular func- 
tion (rather than higher order cellular pro- 
cesses) in order to classify as many proteins 
as possible. These functional predictions 
are based on similarity to sequences of 
known function. 

In our analysis of the 12,731 additional low- 
confidence predicted genes (those with only 
one piece of supporting evidence), only 636 
(5%) of these additional putative genes were 
assigned molecular functions by Ac automated 
methods. One-third of these 636 predicted 
genes represented endogenous retroviral pro- 
teins, fiirther suggesting that the majority of . 



these unknown-function genes are not real 
genes. Given that most of these additional 
12,095 genes appear to be unique among the 
genomes sequenced to date, many may simply 
.represent felse-positive gene predictions. 
The most common molecular fimctions are 
.' the transcription factors and those involved in 
nucleic acid metabolism (nucleic acid enzyme), 
; Other functions that are highly repr^ented in 
the human genome are the receptors, kinases, 
and hydrolases. Not surprisingly, most of the 
hydrolases are proteases. There are also many 
proteins that are members of proto-oncogene 
families, as well as families of "select regula- 
tory molecules": (i) proteins involved in specif- 
ic steps of signal transduction such as hetero- 
trimeric GTP-binding proteins (G proteins) and 
cell cycle regulators, and (ii) proteins that mod- 
ulate the activity of kinases, G proteins, and 
phosphatases. 

Table 17. Distribution of SNPs in classes of 
genomic regions. 
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Fig. 15. Distribution 
of the molecular 
functions of 26,383 
human genes. Each 
slice lists the num- 
bers and percentages 
(in parentheses) of 
human gene functions 
assigned to a given 
category of molecular 
functioa The outer dr- 
de shows the assign- 
ment to molecular 
function categories in 
the Gene" Ontology 
(GO) (779). and the 
inner drcle shows 
the assignment to 
Celera's Panther mo- 
lecular function cate- 
gories (776). 



Panther categories 
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7.2 Evolutionary conservation of core 
processes 

Because of the various "model organism** 
genome-sequencing projects that have al- 
ready been completed, reasonable compara- 
tive information is available for beginning the 
analysis of .the evolution of the human ge- 
nome. The genomes of S, cerevisiae C^baJc- 
ers* yeast") (I J 8) and two diverse inverte-. 
brates, C elegans (a nematode worm) {JI9) 
and D. melanogaster (fly) {26\ as well as the 
first plant genome. A, thaliana, recently com- 
pleted (P2), provide a diverse background for 
genome comparisons. 

We emimerated the "strict orthologs" con- 
served between human and fly, and between 
human and worm (Fig. 16) to address the 
question. What are the core functions that . 
appear to be common across the animals? 
The concept of orthology is important be- 
cause if two genes are orthologs, they can be 
traced by descent to the common ancestor of 
the two organisms (an "evolutionarily con- 
served protein set"), and therefore are likely 
to perform similar conserved functions in the 
different organisms. It is critical in this anal- 
ysis to separate orthologs (a gene that appears 
in two organisms by descent from a common 
ancestor) from paralogs (a gene that appears 
in more than one copy in a given organism by 
a duplication event) because paralogs may 
subsequently diverge in function. Following 
the yeast-wprm ortholog comparison in 



Rg. 16. Functions of putative 
orthologs aaoss vertebrate 
and invertebrate genomes. 
Each slice lists the number and 
percentages (in parentheses) 
of "strict orthologs" between 
the human, fly, and worm ge- 
nomes involved in a given cat- 
egory of molecular function. 
"Strict orthologs" are defined 
here as bi-directional BLAST 
best hits (780) such that each 
orthologous pair (i) has a 
BLASTP P-value of slQ-^o 
(720), and (ii) has a more sig- 
nificant BIASTP score than 
any paralogs in either organ- 
ism, i.e., there has likely been 
no duplication subsequent to 
spedatron that might make 
the orthology ambiguous. This 
measure is quite strict and is a 
lower bound on the number of . 
orthologs. By these criteria, 
there are 2758 strict human- 
fly orthologs, and 2031 hu- 
man-wonn orthologs {1523 in 
common between these sets). 



The Human Genome 

(120), we identified two different cases for 
each pairwise comparison (human-fly and 
-human- worm). The first case was a pair of 
genes, one from each organism, for which 
there was no other close homolog in either 
organism. These are straightforwardly identi- 
fied as orthologous, because there are no 
. additional members of the families that com- 
plicate separating orthologs from paralogs. 
The second case is a family of genes \yith 
more than one member in either or both of the 
organisms being compared. Chervitz et al 
{120) deal with this case by analyzing a 
phylogenetic tree that described the relation-- 
.ships between all of the sequences in both 
. organisms, and then looked for pairs of genes 
that were nearest neighbors in the tree. If the 
nearest-neighbor pairs were from different . 
organisms, those genes were presumed to be 
orthologs. We note that these nearest neigh- 
bors can often be confidently identified from 
pairwise sequence comparison without hav- 
ing to examine a phylogenetic tree (see leg- 
end to Fig. 16). If the nearest neighbors are 
not from different organisms, there has been 
a paralogous expansion in one or both organ- 
isms after the speciation event (and/or a gene 
loss by one organism). When this one-to-one 
correspondence is lost, defining an ortholog 
becomes ambiguous. For our initial compu- 
tational overview of the predicted human pro- . 
tein set, we could not answer this question for 
every predicted protein. Therefore, we con- 

cytoskeletat stnictunit protein (20. 1 2%) 
. chapcrone(I6,0.9%) 
cell adhesion (1 1. 0.d'/o), 
miscellaneous (72, 4.2%) ^ 
viral protein (4, 0.2%). 
. transfei/cajrier protein ( M , 0.6%) - 

transcription factor (8 1,4.7%) . 



nucleic acid enj^ me (221, 12.9%) 



receptor (23, IJ%) 



kinase (69. 4.0%) 



select rcgulaloiy molecule (88, 5.1%) 



transferase (70. 4.1%) 




* sider only "strict orthologs," i.e.. the proteins 
with unambiguous one-to-one relationships 
(Fig. 16). By these criteria, there are 2758 
strict human-fly orthologs, 2031 human- 
worm (1523 in conunon between these sets), 
, We defme the evolutionarily conserved set as 
those 1523 human proteins that have strict 

- orthologs 'm.hoih.D. melanogaster and C 
elegans, . 

. The distribution of the functions of the 
conserved protein set is shown in Fig. 16. 
Comparison with Fig. 15 shows that, not 
surprisingly, the set of conserved proteins is 
..not distributed among molecular functions in 
the same way as the whole human protein set. 
Conipared with the whole human set (Fig. 
1 5), there are several categories that are over- 
represented in the conserved set by a factor of 
^2 or more. The first category is nucleic acid 
enzymes, primarily the transcriptional ma- 
chinery (notably DNA/RNA methyltrans- 
ferases, DNA/RNA polymerases, helicases, 
,DNA ligases, DNA- and RNA-processing 
factors, nucleases, and ribosomal proteins). 
The basic transcriptional and translational 
machinery is well known to have been con- 
served over evolution, from bacteria through 
to the most complex eukaryotes. Many ribo- 
nucleoproteins involved in RNA splicing also 
appear to be conserved among the animals. 
Other enzyme types are also oveitepresent- 
ed . (transferases, oxidoreductases, ligases, 
lyases, and isomerases). Many of these en- 



cxtraccHular matrix ( 1 2, 0.T^i) 
ion channel (7, 0.4%) 
motor (1 3, 0.8%) 

.stnictural protein of muscle (8. 0.5%) 
protooncogenc (23, 1 J%) 

imraccllular transporter (5 1 . 3.0%) 



transponcr(44.2.6%) 



!^mhase and synthetase (64. 3.7%) 

oxidoreductase (64. 3.7%) 

base (I2» 0.7%) 
Kgasc (9, 05%)' 



molecular function unknouit (613. 35.8%) 



hydrolase (80, 4.7%) 
isomcrasc (21. 12%) 
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zymes are involved in intermediary metabo- 
lism. The only exception is the hydrolase 
category, which is not significantly overrep- 
resented in the shared protein set. Proteases 
fonn the largest part of this category, and 
several large protease families have expanded 
in each of these three organisms after their 
divergence. The category of select regulatory 
molecules is also overrepresented in the con- 
served set. The major conserved families are 
' small guariosine triphosphatases (GTPases) 
(especially the Ras-related superfamily, in- 
cluding ADP ribosylation factor) and cell 
cycle regulators (particularly the cullin fam- 
ily, cyclin C family, and several cell division 
protein kinases). The last two significantly 
overrepresented categories are protein trans- 
port and trafficking, and chaperones. The 
most conserved groups in these categories are 
proteins involved in coated vesicle-mediated 
transport, and chaperones involved in protein 
folding and heat-shock response [particularly 
the DNAJ family, and heat-shock protein 
60 (HSP60), HSP70, and HSP90 families].. 
These observations provide only a conserva- . 
tive estimate of the protein .families in the 
context of specific cellular processes that 
were likely derived firom the last common 
ancestor of the human, fly, and worm. As 
stated before, this analysis does not provide a 
complete estimate of conservation across the 
three animal genomes, as paralogous dupli- 
cation makes the determination . of true or- 
thologs difficult withiri the members of con- . 
served protein families. 



7.3 Differences between the human 
genome and other sequenced 
eukaryotic genomes 

To explore the molecular building blocks of 
the vertebrate taxon, we have compared the 
human genome with the other sequenced 
eukaryotic genomes at three levels: molec- 
ular functions, protein families, and protein 
domains. 

Molecular differences can be correlated 
with phenotypic differences to begin to reveal 
the developmental and cellular processes that 
are unique to the vertebrates. Tables 18 and 
19 display a comparison among all sequenced 
eukaryotic- genomes, over selected protein/ 
domain faniilies (defined by sequence simi- 
larity, e.g., the serine-threonine protein ki- 
nases) and superfamilies (defmed by shared 
molecular function, which may include sev- 
eral sequence-related families, e.g., the cyto- 
kines). In these tables we have focused on 
(super) families that are either very large or 
that differ significantly in humans compared 
with the other sequenced eukaryote genomes. 
We have found that the most prominent hu- 
man expansions are in proteins involved in (i) 
acquired immune functions; (ii) neural devel- 
opment, structure, and functions; (iii) inter- 
cellular and intracellular signaling pathways 



in development and homeostasis; (iv) hemo- 
stasis; and (v) apoptosis. 

Acquired immunity. One of the most 
striking differences between the human ge- 
: nome and the Drosophila or C. elegans ge- 
. nome is the appearance of genes involved in 
acquired inununity (Tables 18 and 19). This 
is expected, because the acquired immune 
response is a defense system that only occurs 
in vertebrates. AVe observe 22 class I and 22 
class;. II; major sihistocompatibility complex 
\ (MHC) antigen genes and 1 14 other immu- 
noglobulin genes, in*the human .genome. In 
addition, there are 59 genes in the cognate 
immunoglobulin receptor family. At the do- 
main level, this is exemplified by an expan- 
sion and recruitment of the ancient immuno- 
globulin fold to constitute molecules such as 
MHC, and of the integrin fold to form several 
of the cell adhesion molecules that mediate, 
interactions between immune effector cells 
; and the extracellular matrix. Vertebrate-spe- 
cific proteins, include the paracrine immune 
regulators family, of secreted 4-alpha helical 
. . bundle proteins, namely the cytokines and 
•chemokines. Some of the cytoplasmic signal 
. transduction components associated with cy- 
tokine receptor signal 'transduction. are also - 
features that are poorly represented in the fly 
and worm. These include protein domains 
found in the signal transducer and activator of 
transcription (STATs), the suppressors of cy- 
tokine signaling (SOCS), and protein inhibi- 
tors of activated STATs (PIAS). In contrast, 
many of the animal-specific protein domains 
that play a role in irmate immune response, 
such as the Toll receptors, do not appear to be 
significantly expanded in the human genome. 

Neural development, structure, and 
function. In the human genome, as compared 
with the worm and fly genomes, there is a 
marked increase in the number of members 
of protein families .that are involved in 
neural development. Examples include neu- 
rotrophic factors such as ependymin, nerve 
growth factor, and signaling molecules 
such as semaphorins, as well as the number 
of proteins involved directly in neural 
structure and function such as myelin pro- 
teins, voltage-gated ion channels, and syn- 
aptic . proteins such as synaptotagmin. 
These observations correlate well with the 
known phenotypic differences between the 
nervous systems of these taxa, notably (i) 
the increase in the number and connectivity 
of neurons; (ii) the increase in number of 
distinct neural cell types (as many as a 
thousand or more in human compared with 
a few hundred in fly and worm) (72/); (iii) 
the increased length of individual axons; 
and (iv) the significant increase in glial cell 
number, especially the appearance of my- 
elinating glial cells, which are electrically 
inert supporting cells differentiated from 
the same stem cells as neurons. A number 



of prominent protein expansions are in- 
volved in the processes of neural develop- 
ment. Of the extracellular domains that me- 
diate cell adhesion, the connexin domain- 
containing proteins (122) exist only in hu- 
mans. These proteins, which are not present 
in the Drosophila or C. elegans genomes, . 
appear to provide the constitutive subunits 
- of intercellular chaimels and the structural 
basis for electrical coupling.- Pathway find- 
. iiig by. axons and neuronal network forma- 
tion is mediated through a subset of ephrins 
and their cognate receptor tyrosine kinases 
that act as positional labels to establish 
topographical projections (/2i). The prob- 
able biological role for the semaphorins (22 
in human compared with 6 in the fly and 2 
: in the worm) and their receptors (neuropi- 
. lins and plexins) is that of axonal guidance 
molecules {124). Signaling molecules such 
as neurotrophic factors and some cytokines 
have been shown to regulate neuronal cell 
survival, proliferation, and axon guidance 
{125), Notch receptors and ligands play 
. important roles in glial cell fate determina- 
tion and gliogenesis {126). 

Other human expanded gene families play 
• key roles directly in neural structure and 
function. One example is synaptotagmin (ex- 
panded more than twofold in humans relative 
to the invertebrates), originally found to reg- 
ulate synaptic transmission by serving as a 
Ca^*^ sensor (or receptor) during synaptic 
vesicle fusion and release (727). Of interest is 
the increased co-occurrence in humans of 
PDZ and the SH3 domains in neuronal- 
specific adaptor molecules; examples include 
proteins that likely modulate channel activity 
at synaptic junctions (725). . We also noted 
expansions in several ion-chaimel families 
(Table 19), including the EAG subfamily 
(related to cyclic nucleotide gated channels), 
the voltage-gated . .calcium/sodium channel 
family, the inward-rectifier potassium chan- 
nel family, and the, voltage-gated potassium 
channel, alpha subunit family. Voltage-gated 
sodium and potassiimi channels are involved 
in the generation of action potentials in neu- 
rons. Together with voltage-gated calcium 
channels, they also play a key rple in cou- 
pling action potentials to neurotransmitter rcr 
lease, in the development of neurites, and in 
short-term memory. The recent observation 
of a calcium-regulated association between 
sodium charmels and synaptotagmin may " 
have consequences for the establishment and 
regulation of neuronal excitability (72P). 

Myelin basic protein and myelin-associat- 
ed glycoprotein are major classes of protein 
components in both the central and peripheral 
nervous system of vertebrates. Myelin PO is a 
major component of peripheral myelin, and 
myelin proteolipid and myelin oligodendro- 
cyte glycopotein are found in the central 
nervous system. Mutations in any of these 
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Table 18. Domain-based comparative analysis of proteins in H. sapiens (H), 
D. metanogaster (F), C elegans (W), S. cerevisiae (Y). and >A. t/»a//ana (A). The 
predicted protein set of each of the above eukaryotic organisms was analyzed 
with Pfam version 5.5 using E value cutoffs of 0.001. The number of proteins . 
containing the specified Pfam domains as well as the total number of domains 
(in parentheses) are shown in each column. Domains were categorized into 
cellular processes for presentation. Some domains (i.e., SH2) are listed in 



more than one cellular process. Results of the Pfam analysis may differ from 
results obtained based on human curation of protein famibes, owing to the 
limitations of large-scale automatic classifications. Representative examples 
of domains with reduced counts owing to the stringent E value cutoff used for 
this analysis are marked with a double asterisk (♦♦). Examples include short 
dh^ergent and predominantly alpha-helical domains, and certain dasses of 
cysteine-rich zinc finger proteins. , 



Accession 
number 



Domain name 



Domain description 



H 



W 



PF02039 

PF00212 

PF00028 

PF00214 

PFOniO 

PF01093 

PF00029 

PF00976 

PF00473 

PF00007 

PF00778 

PF00322 

PF00812 

PF01404 

PF00167 

PF01534 

PF00236 

PF011S3 

PF01271 

PF02058 

PF00049 

PF00219 

PF02024 

PF00193 

PF00243 

PF02158 

PF06l84 

PF02070 

PF00066 

PF00865 

PF00159 

PF01279 

PF00123 

PF00341 

PF01403 

PF01033 

PF00l63 

PF02208 

PF02404 

PF01034 

PF00020 

PF00019 

PF01099 

PF01160 

PF00110 

PF01821 
PF00386 
PF00200 
PF00754 
PF01410 

.PF00039 
PF00040 

PF(X)051 

PF01823 

PF00354 

PF00277 

PF00084 

PF02210 

PF01108 

PF00868 

PF00927 



Adrenomedullin 
ANP 

Cadherin 

Calc_CGRPJAPP 

CNTF 

.-Ousterin . 
Connexin 
ACTH.domain 
CRF 

Cys.knot 
DIX 

Endothelin 

Ephrin 

EPhJbd 

FCF 

Frizzled 

Hormones 

Glypican 

Cranin 

Cuanylin 

Insulin 

IGFBP 

Leptin 

Xlink 

NCF 

Neuregulin 
Hormones 
NMU 
Notch 

Osteopontin 

Homione3 

Parathyroid 

Hormone2 

PDCF 

Sema 

Somatomedin_B 

Hormone 

Sorb 

SCF 

Syndecan 

TNFR_c6 

TGF-P 

Uteroglobin 

Opiods.neuropep 

Wnt 

ANATO 
Clq 

Disintegrin 

F5_F8_type_C 

COLFI 

Fnl 

Fn2 

Kringle 

MACPF 

Pentaxin 

SAA4)roteins 

Sushi 

TSPN 

Tissue Jac 

Transglutamin_N 

Transglutamin.C 



. ' Developmental and homeostatic 

Adrenomedullin 

Atrial natriuretic peptide 

Cadherin domain 

Calcitonin/CCRP/IAPP iFamily 

Ciliary neurotrophic factor 

Clusterin 

Connexin 

Corticotropin ACTH domain 

Corticotropin-releasing factor family 

Cystine-knot domain 

Dix domain 

Endothelin family 

Ephrin 

Ephrin receptor ligand binding domain 
Fibroblast grovrth factor 
Frizzled/Smoothened family membrane region 
Glycoprotein hormones 
Glypican 

Grainin (chromogranin or secretogranin) 

Guanylin precursor 

Insulin/IGF/Relaxin family 

Insulin-like growth factor binding proteins 

Leptin 

LINK (hyaluron binding) 
Nerve growth factor family 
Neuregulin family 
Neurohypophysial hormones 
Neuromedin U 
Notch (DSL) domain 
Osteopontin 

Pancreatic hormone peptides 
Parathyroid hormone family 
Peptide hormone 

Platelet-derived growth factor (PDCF) 
Sema domain 
Somatomedin B domain 
Somatotropin 

Sorbin homologous domain 

Stem cell factor 

Syndecan domain 

TNFR/NCFR cysteine-rich region 

Transforming growth factor p-like domain 

Uteroglobin family 

Vertebrate endogenous opioids neuropeptide 
Wnt family of developmental signaling proteins 

Hemostasis 

Anaphylotoxin-like domain 

C1q domain 

Disintegrin 

F5/8 type C domain 

Fibrillar collagen C-terminal domain 

Fibronectih type I domain ; 

Fibronectin type II domain 

Kringle domain 

MAC/Perforin domain 

Pentaxin family 

Serum amyloid A protein 

Sushi domain (SCR repeat) 

Thrombospondin N-terminaWike domains 

Tissue factor 

Transglutaminase family 

Transglutaminase family 



regulators 

^ 
2 

100(550) 
3 
1 
3 

14(16) 
1 
2 

10(11) 
5 
3 

7(8) 



0 
0 

14(157) 
0 
0 
0 
0 
0 
1 
2 
2 
0 
2 



12 


2 


23 


1 


9 


7 


1 


0 


14 


2 


3 


0 


1 


0 


7 


4 


10 


0 




0 


13(23) 


0 


3 


0 


4 


0 


1 . 


0 


1 


0 


3(5) 


2(4) 


1 


0 


3 


0 




0 


5(9) 


0 


5 


1 


27(29) 


8(10) 


5(8) 


3 


1 


0 


2 


0 


2 


0 


3 


1 


17(31) 


1 


27(28) 


6 


3 


0 


3 . 


0 


18 


7(10) 



0 
0 

16(66) 
0 
0 
0 
0 
0 
0 
0 
4 
0 
4. 
1 
1 
3 
0 
1 
0 
0 
0 
0 
0 

1 

0 
0 
0 
0 

2(6) 
0 
0 
0 
0 
0 

3(4) 
0 
0 
0 
0 

1 

0 
4 
0 
0 
5 



0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 



0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

b 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

6 

0 



6(14) 


0 


0 


0 


0 


24 


0 


0 


0 


0 


18 


2 


3 


0 


0 


15(20) 


5(6) 


2 


0 


0 


.10 


. 0 - 


0 


0 


0 


5(18) , 


. 0 


; ' 0 . 


0 


0 


.11(16) . 


0. 


0 


0. 


0 


15(24) 


2 


2 


0 


0 


6 


0 


0 


0 


0 


9 


0 


0 


0 


0 


4 


0 


0 


0 


0 


53(191) 


11(42) 


8(45) 


0 


0 


14 


1 


0 


0 


0 


1 


0 


0 


0 


0 


6 


1 


0 


0 


0 


8 


1 


0 


0 


0 



1338 



16 FEBRUARY 2001 VOL 291 SCIENCE wwwjciencemag.org 



Table 18 {Continued) 



The Human Genome 



Accession 
number 



Domain name 



Domain description 



W 





uia 


PF00711 


Defensin_beta 


PF00748 


Calpainjnhib 


PFCX)666 


Cathelicidins 


■ PF00i29 


; . MHCJ 


PF00993 


- * ■ 

MHCJLalpha'* 


PF00969 


MHCJLbeta** 


PF00879 


Defensin_propep 


PFOn09 


CM_CSF 


PF00047 


»g 


PF00143 


Interferon 


PF00714 


IFN-gamma 


PF00726 


IL10 


PF02372 


IL15 


PF00715 


IL2 


PF00727 


IL4 


PF02025 


IL5 


PF01415 


IL7 


PrO0340 


IL1 


PF02394 


IL1_propep 


PF02059 


IL3 


PFCXMfig 


IL6 


PF01291 


UF.OSM 


PF00323 


Defensins 


rrU 11/3 1 


PTN_MK 


rrXAJcf 1 


SAA_prcteins 


rruuu«K> 


ILo 


PF01582 


TIR 


PF0O229 


TNF 


PCAAAQO 

rruuuoo 


1 reroii 


Pr00779 


BTK 


PrOOlDo 


C2 


PF00609 


DACKa 


Pr 00781 


DACKc 


PF00610 


DEP 


PF01363 


FYVE 


PF00996 


GDI 


PF00503 


C-alpha . 


PF00631 


C-gamma 


PF00616 


RasGAP 


PF00618 


RasGEFN 



PF00625 
PF02189 
PF00169 
PF00130 

PF00388 

PF00387 

PF00640 
PF02192 
PF00794 
PF01412 
PF02196 
PF02145 
PF00788 
PF00071 
PF00617 
PF00615 
PF02197 



Cuanylate.kin 
ITAM . 
PH 

DAG_PE-bind 
PI-PLC-X 
PI-PLC-Y 
PID 

PI3iep8SB 
PI3ierbd 
ArfCAP 
RBD 

Rap.GAP 

RA 

Ras 

RasCEF 

RCS 

Rlla 



Vitamin K-dependent carboxytation/gamma- 
carboxyglutamic (GLA) domain 

immune response 

Beta defensin 

Calpain inhibitor repeat 

Cathelicidins " . 

Class I histocompatibility antigea. domains alpha 1 

and 2 ....... 

Class I) histocompatibility antigen/alpha domain 
Class II histocompatibility antigen, beta domain 
Defensin propeptide 

Cranulocyte-maaophage colony-stimulating factor 

Immunoglobulin domain 

Interferon alpha/beta domain 

Interferon gamma 

lnterteukin-10 

lnterleukin-15 

lnterieukin-2 

lnterleukin-4 

lnterleukin-5 

lnterleukin-7/9 family 

lnterleukin-1 

lnterleukin-1 propeptide 

lnterteukin-3 

lnterleukin-6/G-CSF/MGF family 

Leukemia Inhibitory factor (LiF)/oncostatin (OSM) 

family 
Mammalian defensin 
PTN/MK heparin-binding protein 
Serum amyloid A protein 
Small cytokines (intecrine/chemokine), 

interteukin-8 like 
TIR domain 

TNF (tumor nWosis factor) family 
Trefoil (P-type) domain 

Pl'PY-rfjo CTPase signaling 

BTK motif 
C2 domain 

Diacylglycerol kinase accessory domain (presumed) 
Diacylglycerol kinase catalytic domain (presumed) 
Domain found in Dishevelled, Egl-10, and 

Pleckstrin (DEP) 
FYVE zinc finger 
GDP dissociation inhibitor 
G-proteIn alpha subunit 
G-protein gamma like domains 
GTPase-activator protein for Ras-like GTPase 
Guanine nucleotide exchange factor for Ras-like 

CTPases; N-terminal motif 
Guanylate kinase 

Immunoreceptor tyrosine-based activation motif 
PH domain 

Phorbol esters/diacylglycerol binding domain (CI 
domain) 

Phosphatidylinositol-specific phospholipase C X 
domain 

PhosphatidylinositoUspeclfic phospholipase C, Y 
domain 

Phosphotyrosine interaction domain (PTB/PID) 
PI3-kinase family. p85-binding domain 
PI3-kina5e family, ras-binding domain 
Putative GTP-ase activating protein for Arf 
Raf-Uke Ras-binding domain 
Rap/ran-GAP 

Ras assodation (RalGDS/AF-6) domain 
Ras family 
RasGEF domain 

Regulator of G protein signaling domain 
Regulatory subunit of type II PKA R-subunit 
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12 ^ 


0 


0 


b 


0 
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0 


2 


0 


0 


5 


1 


0 


0 


0 
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32(44) 
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66 (90) 


9 


4 


7 


0 


6 


10 


8 


8 


2 
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4 


10 


5 


2 
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14 


15 


5 


15 


6 


2 
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1 
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10 
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23 
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8 


11 
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1 


8 
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13 
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0 
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1 


1 


0 


0 


6 


3 


1 


0 


0 


16 


9 


8 


6 


15 
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4 


1 


0 


0 


5 


4 


2 


0 


0 


18(19) 


7(9) 


6 


1 


0 
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56(57) 


51 


23 


78 


21 
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7 


5 


0 


27 


6(7) 


12(13) 


1 


0 
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1 
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Accession 
number . 



Domain name 



Domain description 



W 



PF00620 
PF00621 
PFO0536 
PF01369 
PF00017 
PF00018 
PF01017 
PF00790 
PF00568 

PF00452 
PF02180 
PF00619 
PF0053T 
PF01335 
PF02179 
PF00656 
PF00653 

PF00022 
PF00191 
PF00402 
PF00373 
PF00880 
PF00681 
PF00435 
PF00418 
PF00992 
PF02209 
PF01044 



RhoGAP 

RhoGEF 

SAM 

Sec7 

SH2 

SH3 

STAT 

VHS 

WH1 

Bd-2 

BH4 

CARD 

Death 

DED 

BAG 

ICE_p20 

BIR 

Actin 

Annexin 

Calpontn 

Band_41 

Nebulln_repeat 

Plectin_repeat 

Spectrin 

Tubulin-binding 

Troponin 

VHP ' 

Vinculin 



PF01391 


Collagen 


PF01413 


C4 


PF00431 


CUB 


PF00008 


EGF 


PF00147 


Fibrinogen.C 


PF00041 


Fn3 


PF00757 


Furin-like 


PF00357 


Integrin.A 


PF00362 


Integrin.B 


PF000S2 


. Laminin.B 


PF00053 


Laminin.EGF 


PF00054 


Laminin^C 


PF000S5 


Laminin.Nterm 


PF000S9 


Lectin c 


PF01463 


LRRCT 


PF01462 


LRRNT 


PF00057 


LdLrecept.a 


PF00OS8 


LdLrecept b 


PF00530 


SRCR 


PF00084 


Sushi 


PF00090 


Tsp.l 


PF00092 


Vwa 


PF00093 


Vwc 


PF00094 


Vwd 


PFO6244 . 


14-3-3 


PF00023 


Ank 


PF00514 


Armadillo.seg 


PF00168 


C2 


PF00027 


cNMP.binding 


PF01556 


DnaJ_C 


PF00226 


DnaJ 


PF00036 


Efhand** 


PF00611 


FCH 


PF01846 


FF 


PF00498 


FHA 



RhoCAP domain 
RhoGEF domain 

SAM domain (Sterile alpha motif) 
Sec7 domain 

Src homology 2 (SH2) domain 
Src homology 3 (SH3) domain 
STAT protein 
VHS domain 
WH1 domain 

Domains involved in apoptosis 

Bcl-2 

Bcl-2 homology region 4 
. Caspase recruitment domain 
Death domain 
Death effector domain 
Domain present in Hsp70 regulators 
ICE-like protease (caspase) p20 domain 
Inhibitor of Apoptosis domain 

. Cytoskeletat 

Actin 
Annexin 
Calponin family 

FERM domain (Band 4.1 family) 
Nebulin repeat 
Plectin repeat 
Spectrin repeat 

Tau and MAP proteins, tubulin-binding 
Troponin 

Villin headpiece domain 
Vinculin family 

fCM adhesion 
Collagen triple helix repeat (20 copies) 
C-terminal tandem repeated domain in type 4 

procollagen 
CUB domain 
ECF-like domain 

Fibrinogen beta and gamma chains, C-terminal 

globular domain 
Fibronectin type III domain 
Furin-like cysteine rich region 
Integrin alpha cytoplasmic region 
Integrins, beta chain 
Laminin B (Domain IV) 
Laminin ECF-like (Domains lll and V) 
Laminin C domain 
Laminin N-terminal (Domain VI) 
Lectin C-type domain 
Leucine rich repeat C-terminal domain 
Leucine rich repeat N-terminal domain 
Low-density lipoprotein receptor domain class A 
Low-density lipoprotein receptor repeat class B 
Scavenger receptor cysteine-rich domain 
Sushi domain (SCR repeat) 
Thrombospondin type 1 domain 
von Willebrand factor type A domain 
von Willebrand factor type C domain 
. von \yillebrand factor type D. domain 

Protein interaction domains 

14-3-3 proteins 
Ank repeat 

Armadillo/beta-catenin-like repeats 
C2 domain - 

Cyclic nudeotide-binding domain 
DnaJ C terminal region 
DnaJ domain 
EF hand 

Fes/CIP4 homology domain 
FF domain 
FHA domain 
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19 
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9 


46 
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18 09) 
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15 


8 


3 


13 


5 


5 


5 
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1 
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46(61) 
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7 

f 


1 


1(2) 


0 


4 


2 


4 


4 


7 


2 
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1 


9 


2 


1 


0 


3 


0 


1 


0 


16 


0 
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0 


16 


5 


7 


0 


4(5) 


0 


0 


0 


5(8) 


3 


2 


1 


11 


7 


3 


0 


8(14) 


5(9) 


2(3) 


1 (2) 


61 (64) 


15(16) 


12 


9(11) 


16(55) 


4(16) 


4(11) 


0 


13(22) 


3 


7(19) 


0 


29(30) 


17(19) 


11(14) 


0 


4(148) 


1(2) 


1 


0 


2(11) 


0 


0 


0 


31 (195) 


13(171) 


10(93) 


0 


4(12) 


1(4) 


2(8) 


0 


4 


6 


8 


0 


5 


2 


2 


0 


• ; 4 • 


2 


1 


0 



8 
0 
6 
9 

. 3 
4 
0 
8 
0 

. 0 
0 
0 
0 
0 
5 
0 
0 

24 
6(16) 
0 
0 
0 
0 
0 
0 
0 
5 

0. 



65 (279) 


. 10(46) 


174(384) 


. 0 


0 


6(11) 


2(4) 


3(6) 


0 


0 


47(69) 


.9(47) 


43(67) 


0 


0 


108 (420) 


45 (186) 


54(157) 


0 


1 


26 


10(11) 


6 


0 


0 


106(545) 


42 (168) 


34(156) 


0 


1 


5 


2 


1 


0 


0 


3 


1 


2 


0 


0 


8 


2 


2 


0 


0 


8(12) 


4(7) 


6(10) 


0 


0 


24(126) 


9(62) 


11(65) 


0 


0 


30(57) 


18(42) 


14(26) 


0 


0 


10 


6 


4 


0 


0 


47(76) 


23(24) 


91 (132) 


0 


0 


69(81) 


23(30) 


7(9) 


0 


0 


40(44) 


7(13) 


3(6) 


0 


0 


35(127) 


33(152) 


27(113) 


0 


0 


15(96) 


9(56) 


7(22) 


0 


0 


11(46) 


4(8) 


1(2) 


0 


0 
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11(42) 


8(45) 


0 


0 


41 (66) 


11(23) 


18(47) 
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34(58) 


0 
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0 


1 
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2(5) 


0 


0 


15(35) 
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..... 9 . 
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3 


. 3 
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15 


145(404) 


72 (269) 
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73 (101) 
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24(35) 


6(9) 


66 (90) 


26(31) 


21 (33) 


15(20) 


2(3) 


22 


12 


9 


5 


3 


19 


44 


34 


33 


20 


93 


83 (151) 


64(117) 


41 (86) 


4(11) . 


120 (328) 


9 


3 




4 


0 


4(11) 


4(10) 


3(16) 


2(5) 


4(8) 


13 


- 15 


7 


13(14) 


17 
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myelin proteins result in severe demyelina- 
tion, which is a pathological condition in 
which the myelin is lost and the nerve con- 
duction is severely impaired (J30). Humans 
have at least 10 genes belonging to four 
different families involved in myelin produc- 

Table 18 (Conttnued) 



The human Genome 

fion (five myelin PO, three myelin proteolip- 
id, myelin basic protein, and myelin-oligo- 
dendrocyte glycoprotein, or MOG), and pos- 
sibly more-remotely related members of the 
MOG family. Flies have only a single myelin . 
proteolipid, and worms have none at all. 



cercellular and intracellular signaling 
pathways in development and homeostasis. 
Many protein families that have expanded in 
humans relative to the invertebrates are in- 
volved in signa l ing processes, particularly in 
response to development and differentiation 



Accession 
number . 


Domain name 




rKDr 


r ru 1 37w 




rrU l>*r*f 




PF00560 


LRR** 


PF00917 


MATH 


PF00989 


PAS 


PF00595 


PDZ 


PF00169 


PH 


PF01535 


PPR** 


PF00536 


SAM 


PF01369 


S€c7 


PF00017 


SH2 


PF00018 


SH3 


PF01740 


STAS 


PF00515 


TPR** 


PF00400 


WD40** 


PF00397 


WW 


PF00569 


21 


PF01754 


7f-A7n 


PF01388 


ARID 


PF01426 


BAH 


PF00643 


2f-B_box** 


PF00S33 


BRCT 


PF00439 


Bromodonnain 


PF00651 


BTB 


PF00145 


i^i^/\_in c triy lose 


PF00385 


Chromo 


PF00125 


niAfcunc 


PF00134 


Cyclin 


PF00270 


DEAD 


PF01529 


Zf-DHHC 


PF00646 


F-box** 


PF00250 


ForK_head 


PF00320 


GATA 


PF01585 


C-patch 


PFOOOlO 


HLH** 


PF00850 


Hist_deacetyl 


PF00046 


Homeobox 


PF01833 


TIG 


PF02373 


JmjC 


PF02375 


JmjN 


PF00013 


KH-domain 


PF01352 


KRAB 


PF00104 


Hormone_rec 



Domain description. 



n 


r 
r 


w. . 


y 


'■ " A 


15(20) 


7(8) 
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2 
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65 (124) 
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98(226) 
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11(15) 
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13 
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2 
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21(25) 
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1 


2 
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10(18) 


. 23(35) 


. 10(16) 


12(16) 


37(48) 
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18 (26) 


10(15) 


28 


97 (98) 


. 62 (64) 


86(91) 


1(2). 


30(31) 


3(4) 


1 
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0 


13(15) 


24(27) 


14(15) 


17(18) 


1(2) 


12 


75(81) 
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71 (73) 


8 


48 


19 


10 


10 


11 


35 


63 (66) 


48(50) 


55(57) 


50(52) 


84(87) 


15 


20 


16 


7 


22 


16 


15 


309(324) 


9 


165(167) 


35(36) 
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8(10) 


9 
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18 
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14(15) 


60(61) 
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24 


4 


39 
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10 
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82 (84) 


6 


66 
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1 
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2 


3 


7 


28(67) . 


.14(32) 


17(46) 


4(14) 


27(61) 


204(243) 


.0 


0 


0 


0 


47 


17 


142(147) 


0 


0 


62 (129) 


33 (83) 


33 (79) 


4(7) 


10(16) 


11 


5 


88(161) 


1 


61 (74) 


32(43) 


18(24) 


17(24) 


15(20) 


243 (401) 


1 


0 


0 


0 


0 


14 


14 


9 


1 


7 


68(86) 


40(53) 


32(44) 


14(15) 


96(105) 


15 


5 


4 


0 


0 


7 


2 


1 


1 


0 


224(324) 


127(199) 


94(145) 


43(73) 


232 (369) 


15 


8 


5 


5 


6(7) 


44(51) 


10(12) 


5(7) 


3 


6 


10 


2 


6 


0 


23 


17(19) 


8 


22 


0 


0 



PF00412 
PF00917 
PF00249 
PF02344 
PF01753 
PF00628 
PF00157 
PF02257 
PF00076 

PF02037 
PF00622 
PF01852 
PF00907 



UM 
MATH 

Myb_DNA-binding 

Myc-L2 

Zf-MYND 

PHD 

Pou 

RF)LDNA^binding 
Rnm 

SAP 
SPRY 
START 
T-box 



FKBP-type peptidyt-prolyt cis-trans isomerases 

CAF domain 

Kelch motif 

Leudne Rich Repeat 

MATH domain 

PAS domain 

PDZ domain (Also known as DHR or CLCF) 
PH domain 
PPR repeat 

SAM domain (Sterile alpha motif) 
Sec7 domain 

Src homology 2 {SH2) domain 
Src homology 3 (SH3) domain 
STAS domain 
TPR domain 
WD40 domain 
WW domain 

ZZ-Zinc finger present in dystrophin, CBP/p300 
Nuclear interaction dc 

A20-like zinc finger 
ARiD DNA binding domain 
BAH domain 
B-box zinc finger 

BRCA1 C Temiinus (BRCT) domain 

Bromodomain 

BTB/POZ domain 

C-5 cytosine-specific DNA methytase 
chromo' (CHRromatin Organization Modifier) 
domain 

Core histone H2A/H2B/H3/H4 
Cydin 

DEAD/DEAH box heUcase 
DHHC zinc finger domain 
F-box domain 
Fork head domain 
GATA zinc finger 
G-patch domain 

Helix-loop-helix DNA-binding domain 
Histone deacetylase family 
Homeobox domain . 
IPT/riC domain 
JmjC domain 
JmjN domain 
KH domain 
KRAB box 

Ligand-binding domain of nudear hormone 

receptor 
UM domain containing proteins 
MATH domain 

Myb-like DNA-binding domain 
Myc leudne zipper domain 
MYND finger 
PHD-finger v 

Pou domain — N-terminal to homeobox domain 

RFX DNA-binding domain 

RNA recognition motif (a.k.a. RRM. RBD. or RNP 

domain) 
SAP domain 
SPRY domain 
START domain 
T-box 



www3ciencemag.org SCIENCE VOL 291 16 FEBRUARY 2001 



1341 



i 

II 

ltd 



The Human Genome 



Table 18 (Continued) 



Accession 
number 



Domain name 



Domain description 



PF02135 ZMA2 TAZ finger 

PF01285 TEA TEA domain 

PF02176 Zf-TRAF TRAF-type zinc finger 

PF00352 TBP Transcription factor TFIID (or TATA-binding 

protein, TBP) 

PF00567 TUDOR TUDOR domain 

PF00642 Zf-CCCH Zinc finger C-x8-C-x5-C-x3-H type (and similar) 

PF00096 Zf-C2H2** Zinc finger, C2H2 type 

PF00097 Zf-C3HC4 Zinc finger, C3HC4 type (RING finger) 

PF00098 Zf-CCHC Zinc knuckle 



H 


F 


W 


Y 


A 


2(3) 
4 

6(9). 
2(4).. 


1(2) 
1 

. 1(3) 
4(8) 


6(7) 
1 
1 

2(4) 


0 
1 
0 

-1(2) 


10(15) 
0 
2 

2(4) 


9(24) 
17(22) 
564(4500) 
135(137) 
9(17) 


9(19) 
6(8) 
234(771) 
57 
6(10) 


4(5) 
22(42) 
68(155) 
88(89) 
17(33) 


0 

3(5) 
34(56) 
18 
7(13) 


2 

31(46) 
21 (24) 
298(304) 
68(91) 



(Tables 18 . and 19). They include secreted 
hormones and growth factors, receptors, in- 
tracellular signaling molecules, and transcrip- 
tion factors. 

Developmental signaling molecules that are 
enriched in the human genome include growth 
factors such as wnt, transfomiing growth fac- 
tor-p (TGF-3), fibroblast growth factor (FGF). 
nerve growth factor, platelet derived growth 
fector (PDGF), and ephrins. These growth fac- 
tors affect tissue differentiation and a wide 
range of cellular processes involving actin-cy- 
toskeletal and nuclear regulation. The corre- 
sponding receptors of these developmental li- 
gands are also expanded in humans. For exam- 
ple, our analysis suggests at least 8 human 
ephrin genes (2 in the fly, 4 in the womi) and 1 2 . 
ephrin receptors (2 in the fiy, 1 in the worm). In 
the wnt signaling pathway, we find 18 wnt 
family genes (6 in the fly, 5 in the wonm) and 
12 frizzled receptors (6 in the fly, 5 in the 
womi). The Groucho family of transcriptional 
corepressors downstream in the wnt pathway 
are even more maricedly expanded, with 13 
predicted members in humans (2 in the fly, 1 in 
the worm). 

Extracellular adhesion molecules involved 
in signaling are expanded in the human genome 
(Tables 18 and 19). The interactions of several 
of these adhesion domains with extracellular 
matrix proteoglycans play a critical role in host 
defense, morphogenesis, and tissue repair 
(757). Consistent with the well-defined role of 
heparan sulfete proteoglycans in modulating 
these interactions (J 32), we observe an e?qpan- 
. sion of the heparin sulfate sulfotransferases in 
the human genome relative to worm and fly. 
These sulfotransferases modulate tissue differ- 
entiation (133). A similar expansion in humans 
is noted in structural proteins that constitute the . 
actin-cytoskeletal architecture. Compared with 
Ae fly and worai, we observe an e;q)iosive 
ejqjansion of the nebulin (35 domains per pro- 
tein on average), aggrecan (12 domains per 
protein on average), and plectin (5 domains per 
protein on average) repeats in humans. These 
repeats are present in proteins involved in mod- 
ulating the actin-cytoskeleton with predominant 
e3q)ression in neuronal, muscle, and vascular 
tissues. 



... Comparison across the. five sequenced eu- 
kaiyotic organisms revealed several expand- 
ed protein families and domains involved in 
cytoplasmic signal transduction (Table 18). 
In particular, signal transduction , pathways 
playing roles in developmental regulation and 
• acquired immunity were substantially en- 
. riched. There is a factor of 2 or greater ex- 
pansion in humans in the Ras superfamily 
GTPases and the GTPase activator and GTP 
" exchange factors associated with them. Al- 
though there are about the same number of 
tyrosine kinases in the human and C elegans 
genomes, in humans there is an increase in 
the SH2, PTB, and ITAM domains involved 
. in phosphotyrosine signal transduction. Fur- 
ther, there is a twofold expansion of phos- 
phodiesterases in the huiman genome com- 
pared with either the worm or fly genomes. 
The downstream effectors of the intracellu- 
. lar signaling molecules include the transcription 
factors that transduce developmental fates. Sig- 
nificant expansions are noted in the ligand- . 
binding nuclear horaione receptor class of tran- 
. ,scription factors compared with the fly genome, 
although not to the extent observed in the worai 
(Tables 18 and 19). Perfiaps tiie most striking 
expansion in humans is in the C2H2 zinc finger 
transcription factors. Pfam detects a total of 
4500 C2H2 zinc finger domains in 564 human 
proteins, compared with 771 in 234 fly proteins. 
This means that there has been a dramatic 
expansion not only in the nimiber of C2H2 
transcription factors, but also in the number of 
these DNA-binding motifs per transcription 
factor (8 on average in humans, 3.3 on average 
in the fly, and 2.3 on average in the worai). 
Furthemfiore, many of these transcription fee- 
tors contain either the KRAB or SCAN do- 
: mains, which are not found in the fly or woraa 
genomes. These domains are involved in the, 
oligomerization of transcription factors and in- 
crease the combinatorial partnering of these 
factors. In general, most of the transcription 
fector domains are shared between the three 
animal genomes, but the reassortment of Aese 
domains results in organism-specific transcrip- 
tion fector families. The domain combinations 
found in the human, fly, and worm include the 
BTB with C2H2 in the fly and humans, and 



' homeodomains alone . or in combination with 
. Pou and LIM domains in all of the animal 
genomes. In plants; however, a different set of 
transcription factors are expanded, namely, the 
myb family, and a unique set that includes VPl 
and AP2 domain-containing proteins {134). 
.The yeast genome has a paucity of transcription 
factors compared with the multicellular eu- 
karyotes, and its repertoire is limited to the 
expansion of the yeast-specific C6 transcription 
factor family involved in metabolic regulatioa 
While we have illustrated expansions in a 
subset of signal transduction molecules in the 
human genome compared with the other eu- 
karyotic genomes, it should be noted that 
most of the protein domains are highly con- 
- served. An interesting .observation, is that 
^ worais and humans have approximately the 
. same number of both tyrosine kinases and 
. - serine/threonine kinases (Table 19). It is im- 
portant to note, however, that these are mere- 
ly coxmts of the catalytic domain; the proteins 
• that contain these domains also display a 
wide repertoire , of interaction domains with 
- . significant combinatorial diversity. 
V • Hemostasis.- Hemostasis is regulated pri- 
marily by plasma proteases of the coagulation 
pathway and by the interactions that occur be- 
tween the vascular endothelium and platelets. 
Consistent with known anatomical and physio- 
logical differences between vertebrates and in- 
vertebrates, extracellular adhesion domains that 
constitute proteins integral to hemostasis are 
expanded in the human relative to the fly and 
worm (Tables 18 and 19). We note the evolu- 
tion of domains such as FIMAC, FNl, FN2, 
and Clq that mediate surface interactions be- 
tween hematopoeitic cells and the vascular ma- 
, trix. In addition, there has been extensive re- 
cruitment of moiie-ancient animal-specific do- 
mains such as VWA, VWC, VWD, kringle. 
and FN3 into multidomain proteins that are 
involved in hemostatic regulation. Although we 
do not find a large expansion in the total num- 
ber of serine proteases, this enzymatic domain 
has been specifically recruited into several of 
these multidomain proteins for proteolytic reg- 
ulation in the vascular compartment ITiese arc 
represented in plasma proteins that belong to 
the kinin and complement pathways. There is a 
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significant expansion in two families of matrix 
metaUopiDteases: ADAM (a disintegrin and 
metaUoprotease) and MMPs (matrix metaUo- 
proteases) (Table 19). Proteolysis of extracel- 
lular matrix (ECM) proteins is critical for tissue 
development and for tissue degradation in dis- 
eases such as cancer, arthritis, Alzheimer's dis- 
ease, and a variety of inflammatory conditions 
{135, 136), ADAMs are a femily of integral 
membrane proteins widi a pivotal role in fibrin- 
ogenolysis and modulating interactions be- 
tween hematopoietic components and the 
vascular matrix components. These proteins 
have been shovm to cleave matrix proteins, 
and even signaling molecules: ADAM-17 
converts tumor necrosis factor-<i, and 
ADAM- 10 has been implicated in the Notch 
signaling pathway {i3S). We have identified 
19 members of the matrix metalloprotease 
family, and a total of 51 members of the 
ADAM and ADAM-TS families. 

Apoptosis. Evolutionary conservation of 
some of the apoptotic pathway components 
across eukarya is consistent with its central 
role m developmental regulation and as a 
response to pathogens and stress signals. The 
signal transduction pathways involved in pro- 
grammed cell death, or apoptosis, are medi- 
ated by interactions between weU-character- 
ized domains that include extracellular do- 
mains, adaptor (protein-protein interaction) 
domains, and those found in effector and 
regulatory enzymes {137), We enumerated 
the protein counts of central adaptor and ef- 
fector enzyme domains that are found only in 
the apoptotic pathways to provide an estimate 
of divergence across eukarya and relative 
expansion in the human genome when com- 
pared with the fly and worm (Table 18). 
Adaptor domains found in proteins restricted 
only to apoptotic regulation such as the DED 
domains are vertebrate-specific, whereas oth- 
ers like BIR, CARD, and Bcl2 are represent- 
ed in the fly and worm (although the number 
of Bcl2 family members in humans is signif- 
icantly expanded). Although plants and yeast 
lack the caspases, caspase-like molecules, 
namely the para- and meta-caspases, have 
been reported in these organisms {138). Com- 
pared with other animal genomes, the human 
genome shows an expansion in the adaptor 
and effector domain-containing proteins in- 
volved in apoptosis, as well as in the pro- 
teases involved in the cascade such as the 
caspase and calpain families. 

Expansions of other protein families. 
Metabolic enzymes. There are fewer cyto- 
chrome P450 genes in humans than in either 
the fly or worm. Lipoxygenases (six in hu- 
mans), on the other hand, appear to be specific 
to Ae vertebrate and plants, whereas die lip- 
oxygenase-activating proteins (four in humans) 
may be vertebrate-specific. Lipoxygenases are 
mvolved in aiachidonic acid metabolism, and 
they and dieir activators have been implicated 
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in diverse human pathology ranging from 
allergic responses to cancers. One of the most 
surprismg human expansions, however, is in 
the number of glyceraIdehyde-3-phosphate 
dehydrogenase (GAPDH) genes (46 in hu- 
mans, 3 in the fly. and 4 in the worm). There 
is, however, evidence for many retrotrans- 



posed GAPDH pseudogenes {139), which 
may account for this apparent expansion. 
However, it is interesting tiiat GAPDH, long 
known as a conserved enzyme involved in 
basic metabolism found across alKphyla from 
.'bacteria to humans, has recently been shown 
to have other functions. It has a second cat- 



I!/1hI\^Tc^%°^ proteins assigned to selected Panther families or subfamilies In H. sapiens (H) D 
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Neural 



Ependymin 
Ion channels 
Acetylcholine receptor 
Amilorlde-sensitive/degenerin 
CNC/EAC 
IRK 

ITP/ryanodine 
Neurotransmitter-gated 
P2X purinoceptor 
TASK 

Transient receptor 
Voltage-gated Ca^* alpha 
Voltage-gated Ca^* alpha-2 
Voltage-gated Ca^"^ beta 
Voltage-gated Ca^* gamma 
Voltage-gated K+ alpha 
Voltage-gated KQT 
Voltage-gated Na"^ 
Myelin basic protein 
Myelin PO 
Myelin proteolipid 

Myelin-oligodendrocyte glycoprotein 
Neuropilin 
Plexin. - 
Semaphorin 
Synaptotagmin 

Defensin 
Cytokinet 
GCSF 
CMCSF 

Intercrine alpha 
Intercrine beta 
Inteferon 
Interteukin 

Leukemia inhibitory factor 
MCSF 

Peptidoglycan recognition protein 
Pre-B cell enhandng factor 
Small indudble cytokine A 
SI cytokine 
TNF 

Cytokine receptorf 
Bradyklnin/C-C chemokine receptor 
Fl cytokine receptor 
Interferon receptor 
Interteukin receptor 
Leukocyte tyrosine kinase 

receptor 
MCSF receptor 
TNF receptor 
Immunoglobulin receptorf 
t-cell receptor alpha chain 
T-cell receptor beta chain 
T-cell receptor gamma chain 
T-cell receptor delta chain 
Immunoglobulin FC receptor 
Killer cell receptor 
Polymeric-immunoglobulin receptor 
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11 
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12 
15 
22 
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5 

1 

33 

6 
11 

1 

5 

3 

1 

2 

9 
22 

Immune response 



12 
24 
9 
3 
2 
51 
0 
12 
3 
4 
3 
2 
0 
5 
2 
4 
0 
0 
1 
0 
0 
2 
6 
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3 
86 
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5 
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26 



2 
1 

14 
2 
9 
62 
7 
2 
3 
32 
3 

1 

3 
59 
16 
15 

1 

1 

8 
16 

4 



0 
14 
0 
0 
0 
0 
0 
1 
0 
0 
13 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 

0 
0 
0 
0 
0 
0 
0 
0 
0 
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3 
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59 
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48 
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11 

3 
4 
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.0 
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2 
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alytic activity, as a uracil DNA glycosylase 
{J40) and functions as a cell cycle regulator 
(J 4 J) and has even been implicated in apo- . 
ptosis (142). 

Translation, Another striking set of hu- 
man expansions has occurred in certain fam- 
ilies involved in the translational machinery. 
We identified 28 different ribosomal subunits 
-that each have at least 10 copies in the ge- 
nome; on average, for all ribosomal proteins 
there is about an 8- to 10-fold expansion in 
the number of genes relative to either the 
worm or fly. Retrotransposed pseudogenes 

Table 19 (Continued) 
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may account for many of these expansions 
[see the discussion above and (143)]. Recent 
.evidence suggests that a number of ribosomal 
proteins have secondary functions indepen- 
, dent of their involvement in protein biosyn- 
. thesis; for example, LI 3a and the related L7 
subimits (36 copies in humans) have been 
.shown to induce apoptosis (144). 
. , rThere is also a four- to.fivefold expansion 
in the : elongation ; factor I-alpha family . 
' (eEFlA; 56 human genes). Many of these 
expansions likely represent intronless para- 
logs that have presumably arisen from retro- . 
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Toll receptor-related 
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0 


0 
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FCF 


24 
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Glucagon 
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Glycoprotein hormone beta chain 
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0 
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Insulin 
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0 
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Insulin-like hormone 
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. 0 
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Ner^e growth factor 
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0 . 
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Neuregulin/heregulin 
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neuropeptide Y 
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PDGF 
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Relaxin 
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Thymopoeitin 
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29 
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VECF 
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Wnt 


18 
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Ephrin receptor 


12 
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FCF receptor 
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Frizzled receptor 
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Parathyroid hormone receptor 
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VECF receptor 
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BDNF/NT-3 nerve growth factor 
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Dual-specificity protein phosphatase 
S/T and dual-specificity protein 

kinasef 
S/T protein phosphatase 

Y protein kinasef 

Y protein phosphatase 

ARF family 

Cydic nucleotide phosphodieiterase 

G proteins^upled receptorsfj . 

C-protein alpha 

C-protein beta 

G-protein gamma 

Ras superfamily 

C-protein modulatorsf 

ARF CTPase-activating 

Neurofibromin 

Ras CTPase-activating 

Tuberin 

Vav proto-oncogene family 
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2 
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0 


35 


15 


13 


3 


0 



transposition, and again there is evidence that 
many of these may be .pseudogenes (145). 
However, a second form (eEFlA2) of this 
factor has been identied with tissue-specific 
expression in skeletal muscle and a comple- 
mentary expression pattern to the ubiquitous- 
ly expressed eEFlA (146), 

Ribqnucleoproteins. \ Alternative . splicing 
^ results in- multiple transcripts':from a;single 
• gene, and can therefore generate additional 
diversity in an organism's protein comple- 
ment We have identified 269 genes for ri- 
bonucleoproteins. This represents over 2.5 
times the number of ribonucleoprotein genes 
in the wonn, two times that of the fly, and 
about the same as the 265 identified in the 
Arabidopsis genome. Whether the diversity 
of ribonucleoprotein genes in humans con- 
tributes to gene regulation at either the splic- 
ing or translational level is unknown. 

Posttranslational modifications. In this 
set of processes, the most prominent expan- 
sion is the transglutaminases, calcium-depen- 
dent enzymes that catalyze the cross-linking 
of proteins in cellular processes such as he- 
mostasis and apoptosis (147). The vitamin 
K-dependent gamma carboxylase gene prod- 
uct acts on the GLA domain (missing in the 
fly and worm) found in coagulation factors, 
osteocalcin, and matrix GLA protein (148). 
Tyrosylprotein sulfotransferases participate . 
in the posttranslational modification of pro- 
teins involved in inflammation and hemosta- 
sis, including coagulation factors and chemo- 
Idne receptors (149). Although there is no 
significant numerical increase in the counts 
for domains involved in nuclear protein mod- 
ification, there are a number of domain ar- 
rangements in the predicted human proteins 
that are not found in the other currently se- 
quenced genomes. These include the tandem 
association of two histone deacetylase do- 
mains in HD6 with a ubiquitin finger domain, 
a feature lacking in the fly genome. An ad- 
ditional example is the co-occurrence of im- 
portant nuclear regulatory enzyme PARP 
(poly-ADP ribosyl transferase) domain fused 
to protein-interaction doniains — BRCT and 
VWA in himians. 

Concluding remarks. There are several 
possible explanations for the differences in 
phenotypic complexity observed in humans 
when compared to the fly and worm. Some of 
these relate tp the. prominent differences in 
the immune system, hemostasis, neuronal, 
vascular, and cytoskeletal complexity. The 
finding that the hiunan genome contains few- 
er genes than previously predicted might be 
compensated for by combinatorial diversity 
generated at the levels of protein architecture, 
transcriptional and translational control, post- 
translational modification of proteins, or 
posttranscriptional regulation. Extensive do- 
main shuffling to increase or alter combina- 
torial diversity can provide an exponential 
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increase in the ability to mediate protein- 
protein interactions without dramatically in- 
creasing the absolute size of the protein com- 
plement {150y Evolution of apparently new 
(from the perspective of sequence analysis) 
protem domains and increasing regulatory 
complexity by domain accretion both quanti- 
tatively and qualitatively (recruitment of nov- 
el domains with preexisting ones) are two 
features that we observe in humans. Perhaps 
the best illustration of this trend is the C2H2; 
zinc fingefHJontaining transcription factors 
where we see expansion in the number of 
domains per protein, together with verte- 
brate-specific domains such as KRAB and 
SCAN. Recent reports on the prominent use 
of mteraal ribosomal entry sites in the human 
genome to regulate translation of specific 
classes of proteins suggests that this is an area 
that needs further research to identify the full 
extent of this process in the human genome 
{15iy At the posttranslational level, although 
we provide examples of expansions of some 
protem families involved in these modifica- 
tions, further experimental evidence is re- 
quired to evaluate whether this is conflated 
with mcreased complexity in protein process- 
ing. Posttranscriptional processing and the 
extent of isoform generation in the human 
remain to be cataloged in their entirety. Given 
the conserved nature of the spliceosomal ma- 
chinery, further analysis will be required to 
dissect regulation at this level. 




C2H2 zinc finger-conUininj?t 
COE 
CREB 

ETS-related 
Forkhead-related . 
FOS 

Croucho 
Histone HI 
Histone H2A 
Histone H2B 
Histone H3 
Histone H4 
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Bithoraxoid 
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Distal-less 
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UM-containing 
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Paired box 
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Nuclear hormone receptorf 

Pou-related 

Runt-related 



Transcription factors/chromatsn organization 



8 Conclusions 

8.1 The whole-genome sequencing 
approach versus BAC by BAC 

Experience in applying the whole-genome 
shotgun sequencing approach to a diverse 
group of organisms with a wide range of 
genome sizes and repeat content allows us to 
assess its strengths and weaknesses. With the 
success of the method for a large number of 
microbial genomes, Drosophila, and now the 
human, there can be no doubt concerning the 
utility of this method. The large number of 
microbial genomes that have been sequenced 
by this method (75. 80, 152) demonstrate that 
megabase-sized genomes can be sequenced 
efficiently without any input other that the de 
novo mate-paired sequences. With more 
complex genomes like those of Drosophila or 
human, map information, in the form of well- 
ordered markers, has been critical for long- 
range ordering of scaffolds. For joining scaf- 
folds into chromosomes, the quality of the 
map (in terms of the order of the markers) is 
more unportant than the number of markers 
per se. AJthough this mapping could have 
been performed concurrently with sequenc- 
ing, the prior existence of mapping data was 
beneficial. During the sequencing of the A 
thaliana genome, sequencing of individual 
BAC clones permitted extension of the se- 
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quence well into centromeric regions and al- 
lowed high-quality resolution of complex re- 
peat regions. Likewise, in Drosophila, the 
BAC physical map was most useful in re- 
gions near the highly repetitive centromeres 
and telomeres. WGA has been found to de- 
liver excellent-quality reconstructions of the 
unique regions of the genome. As the genome 
size, and more importantly the repetitive con- 
tent, increases, the WGA approach delivers 
less of the repetitive sequence. 

The cost and overall efficiency of clone-by- 
clone approaches makes them difficult to justify 
as a stand-alone strategy for future large-scale 
genome-sequencing projects. Specific applica- 
tions of BAC-based or other clone mapping and 
sequencing strategies to resolve ambiguities in . 
sequence assembly that cannot be efficiently 
resolved with computational approaches alone 
are clearly worth e;q)loring. H^rid approaches 
to whole-genome sequencing will only work if 
there is sufficient coverage in both the whole- 
genome shotgun phase and the BAC clone se- 
quencing phase.. Our e>q>erience with human 
genome assembly suggests that this will require 
at least 3 X coverage of both whole-genome and 
BAC shotgun sequence data 



8,2 The low gene number m humans 

We have sequenced and assembled ^95% of 
the euchromatic sequence . of H, sapiens and 
used a new automated gene prediction meth- 
od to produce a preliminary catalog of the 
human genes. This has provided a major sur- 
prise: We have found far fewer genes (26,000 
to 38,000) than the earlier molecular pre- 
dictions (50,000 to over 140,000). Whatever 
the reasons for this current disparity, only 
detailed annotation, comparative genomics 
(particularly using the Mus musculus ge- 
nome), and carefiil molecular dissection of 
complex phenotypes will clarify this critical 
issue of the basic **parts list" of our genome. 
Certainly, the analysis is still incomplete and 
considerable refinement will occiu" in the 
years to come as the precise structure of each 
transcription unit is evaluated. A good place 
to start is to deteraiine why the gene esti- 
mates derived from EST data are so discor- 
dant with our predictions. It is likely that the 
following contribute to an inflated gene num- 
ber derived torn ESTs: the variable lengths 
of 3'- and 5'-imtranslated leaders and trailers; 
the little-understood vagaries of RNA pro- 
cessing that often leave intronic regions in an 
unspliced condition; the finding that nearly 
40% of human genes are alternatively spliced 
(153); and finally, the unsolved technical 
problems in EST library construction where 
contamination from heterogeneous nuclear 
RNA and genomic DNA are not uncommon. 
Of course, it is possible that there are genes 
that remain unpredicted owing to the absence 
of EST or protein data to support them, al- 
though our use of mouse genome data for 



predicting genes should limit this niunber. As 
was true at the begirming of genome sequenc- 
ing, ultimately it will be necessary to measure 
. mRNA in specific cell types to demonstrate 
the presence of a gene, 

J. B. S. Haldane speculated in 1937 that a 
population of organisms might have to pay a 
, - price for the number of genes it can possibly 
carry. He theorized, that when the number of 
genes becomes too large, each zygote carries ^ 
so many new deleterious mutations that the 
population simply cannot maintain itself. On 
the basis of this premise, and on the basis of 
available mutation rates and x-ray-induced 
mutations at specific loci, .Muller, in 1967 
(I54)y calculated that the mammalian ge- 
nome would contain a maximum of not much 
more than 30,000 genes (755). An estimate of 
30,000 gene loci for humans was also arrived 
at by Crow and Kimura (J 56), Muller's esti- 
mate for A melanogaster was 10,000 genes, 
compared to 13,000 derived by aimotation of 
the fly genome {26, 27). These arguments for 
the theoretical maximum gene nimiber were 
based on simplified ideas of genetic load — 
that all genes have a certain low rate of 
mutation to a deleterious state. However, it is 
clear that many mouse, fly, worm, and yeast 
knockout mutations lead to ahnost no dis- 
cernible phenotypic perturbations. 

, The . modest , nimiber of human genes . 
means that we must look elsewhere for the 
mechanisms that generate the complexities 
inherent in hmnan development and the so- 
phisticated signaling systems that maintain 
homeostasis. There are a large number of 
ways in which the functions of individual 
genes and gene products are regulated. The 
degree of "openness" of chromatin structure 
and hence transcriptional activity is regulated : 
by protein complexes that involve histone 
and DNA enzymatic modifications. We enu- 
merate many of the proteins that are likely 
involved in nuclear regulation in Table 19. 
The location, timing, and quantity of tran- 
scription are intimately linked to nuclear sig- 
nal transduction events as well as by the 
tissue-specific expression of many of these 
proteins. Equally important are regulatory 
DNA elements that include insulators, re- 
peats, and endogenous viruses (757); meth- 
ylation of CpG islaiids in imprinting (J5S); 
and promoter-enhancer and iiitronic regions 
that modulate transcription. The spliceosomal 
machinery consists of multisubunit proteins 
(Table 19) as well as structural and catalytic 
RNA elements (75P) that regulate transcript 
structure through alternative start and termi- 
nation sites and splicing. Hence, there is a 
need to study different classes of RNA mol- 
ecules (160) such as small nucleolar RNAs, 
antisense riboregulator RNA, RNA involved 
in X-dosage compensation, and other struc- 
tural RNAs to appreciate their precise role in 
regulating gene expression. The phenomenon 



of RNA editing in which coding changes 
occur directly at the level of mRNA is of 
clinical and biological relevance (161). Final- 
ly, examples of translational control include 
internal ribosomal entry sites that are found 
in proteins involved in cell cycle regulation 
and apoptosis (752). At the protein level, 
minor alterations in the . nature of protein- 
protein interactions, protein modifications, 
and localization can have dramatic effects on 
cellular physiology (755). This dynamic sys- 
tem therefore has many ways to modulate 
activity, which suggests that definition of 
complex systems by analysis of single genes 
is unlikely to be entirely successful. 

In situ studies have shown that the human 
genome is asymmetrically populated with 
G+C content, CpG islands, and genes (68), 
.However, the genes are not distributed quite 
as unequally as had been predicted (Table 9) 
(69). The most G+C-rich fraction of the ge- 
nome, H3 isochores. constitute more of the 
genome than previously thought (about 9%). 
and are the most gene-dense fraction, but 
contain only 25% of the genes, rather than the 
predicted -40%. The low G+C L isochores 
make up 65% of the genome, and 48% of the 
genes. This inhomogeneity, the net result of 
millions of years of mammalian gene dupli- 
cation, has been described as the "desertifi- 
cation" of the vertebrate, genome (77). Why 
are there clustered regions of high and low 
gene density, and are these accidents of his- 
tory or driven by selection and evolution? If 
these deserts are dispensable, it ought to be 
possible to find mammalian genomes that are 
far smaller in size than the human genome. 
Indeed, many species of bats have genome 
sizes that are much smaller than that of hu- 
mans; for example, Miniopterusy a species of 
Italian bat, has a genome size that is only 
50% that of humans (164\ Similarly, Mun- 
tiacus, a species of Asian barking deer, has a 
genome size that is —70% that of humans. 



8.3 Human DNA sequence variation 
and its distribution across the genome 

This is the first eukaryotic genome in which a 
nearly uniform ascertainment of polymorphism 
has been completed. Although we have identi- 
fied and m^jped more than 3 million SNPs, this 
by no means imphes that the task of finding and 
cataloging SNPs is complete. These represent 
only a fi^action of the SNPs present in the 
human population as a whole. Nevertheless, 
this first glimpse at genome-wide variation has 
revealed strong inhomogeneities in the distribu- 
tion of SNPs across Ae genome. Polymorphism 
in DNA carries with it a snapshot of the past 
operation of population genetic forces, includ- 
ing mutation, migration, selection, and genetic 
drift The availabihty of a dense array of SNPs 
will allow questions related to each of these 
factors to be addressed on a genome-wide basis. 
SNP studies can establish the range of haplo- 
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types present in subjects of different ethno^o- 
graphic origins, providing insights into popula- 
tion histoiy and migration patterns. Although 
such studies have suggested that modem human 
lineages derive from Africa, many important 
questions regarding human origins remain un- 
answered, and more analyses using detailed 
SNP maps will be needed to settle these con- 
troversies. In addition to providing evidence for 
population expansions, migration, and admix-, 
ture, SNPs can serve as markers for the extent 
of evolutionary constraint acting on particular 
genes. The conelation between patterns of in- 
traspecies and interspecies genetic variation 
may prove to be especially informative to iden- 
tiiy sites of reduced genetic diversity tiiat may 
mark loci where sequence variations are not 
tolerated. 

The remaricable heterogeneity in SNP 
density implies that there are a variety of 
forces acting on polymorphism — sparse re- 
gions may have lower SNP density because 
the mutation rate is lower, because most of 
those regions have a lower fraction of muta- 
tions that are tolerated, or because recent 
strong selection in favor of a newly arisen 
allele "swept" tiie linked variation out of the 
population (165). The effect of random ge- 
netic drift also varies widely across the ge- 
nome. The nonrecombining portion of the Y 
chromosome faces the strongest pressure 
from random drift because there are roughly 
one-quarter as many Y chromosomes in the . 
population as there are autosorhal chromb- 
sdmes, and the level of polymorphism on the 
Y is correspondingly less. Similarly, the X 
chromosome has a smaller effective popu- 
lation size than the autosomes, and its nu- 
cleotide diversity is also reduced. But even 
across a single autosome, the effective pop- 
ulation size can vary because the density of 
deleterious mutations may vary. Regions of 
high density of deleterious mutations will 
see a greater rate of elimination by selec- 
tion, and the effective population size will 
be smaller (J 66), As a result, the density of 
even completely neutral SNPs will be lower 
in such regions. There is a large literature 
on the association between SNP density 
and local recombination rates in Drosoph- 
ila, and it remains an important task to 
assess the strength of this association in the 
human genome, because of its impact on 
the design of local SNP densities for dis- 
ease-association studies. It also remains an 
important task to validate SNPs on a 
genomic scale in order to assess the degree 
of heterogeneity among geographic and 
ethnic populations. 



8.4 Genome complexity 

We will soon be in a position to move away 
from the cataloging of individual compo- 
nents of the system, and beyond the sim- 
plistic notions of "this binds to that, which 



then docks on this, and then the complex 

moves there " {167) to the exciting area 

of network perturbations, nonlinear re- 
sponses and thresholds, and their pivotal 
role in human diseases. 

The enumeration of other **parts lists" re- 
veals that in organisms with complex nervous 
systems, neither gene number, neuron number, 
nor number of cell types correlates in any' 
meaningfiil manner with even simplistic mea- 
. sures of structural or. .behavioral .complexity. 
Nor would they be expected to; this is the rtalm 
of nonlinearities and epigenesis {168). The 520 
. million neurons of the common octopus exceed . 
the neuronal number in the brain of a mouse by 
an order of magnitude. It is apparent from a 
comparison of genomic data on die mouse and 
human, and from comparative mammalian neu- 
roanatomy {169), that the morphological and 
behavioral diversity found in mammals is im- 
• derpinned by a similar gene repertoire and sim- 
ilar neuroanatomies. For example, when one 
compares a pygmy marmoset (which is only 4 
inches tall and weighs , about 6 ounces) to a 
. chimpanzee, the brain volume of this minute 
. primate is found to be only about 1.5 cm^, two 
. orders of magnitude less than that of a chimp 
and three orders less than that of humans. Yet 
the neuroanatomies of all three brains are strik- 
ingly similar, and the behavioral characteristics 
of the pygmy mannoset are little different from 
those of chimpanzees. Between humans and 
chimpanzees, the gene number, gene structures 
and functionjs, chromosomal and genomic or- 
gaiuzations, and cell types and neuroanatomies 
are almost indistinguishable, yet the develop- 
mental modifications that predisposed human 
lineages to cortical expansion and development 
of the larynx, giving rise to language, culminat- 
ed in a massive singularity that by even the 
simplest of criteria made humans more com- 
plex in a behavioral sense. 

. Simple examination of the number of neu- . 
rons, cell types, or genes or of the genome 
size does not alone account for the differenc- 
es in complexity that we observe. Rather, it is 
the interactions within and among these sets 
that result in such great variation. In addition, 
it is possible that there are "special cases" of 
regulatory gene networks that have a dispro- 
portionate effect on the overall system. We 
have presented several examples of "regula- 
tory genes" tiiat are isignificantly increased in 
the human genome compared with the fly and 
worm. These include extracellular ligands 
and their cognate receptors (e.g., wnt, friz- 
zled, TGF-3, ephrins, and connexins), as well 
as nuclear regulators (e.g., the KRAB and 
homeodomain transcription factor families), 
where a few proteins control broad develop- 
mental processes. The answers to these 
"complexities" perhaps lie in these expanded 
gene families and differences in the regulato- 
ry control of ancient genes, proteins, path- 
ways, and cells. 



L5 Beyond single components 

While few would disagree with the intuitive 
conclusion that Einstein's brain was more 
complex than tiiat of Drosophila, closer com- 
parisons such as whetiier the set of predicted 
human proteins is more complex than the 
protein set of Drosophila, and if so, tQ„ what 
degree, are not straightforward, since protein, 
. . protein domain, or protein-protein interaction 
measures do not capture, context-dependent 
interactions that underpin the dynarnics .un- 
derlying phenotype. 

Currently, there are more than 30 different 
mathematical descriptions of complexity (7 70). 
However, we have yet to imderstand the matii- 
ematical dependency relating the number of 
genes with organism complexity. One pragmat- 
ic approach to the analysis of biological sys- 
tems, which are composed of nonidentical ele- 
ments (proteins, protein complexes, interacting 
.cell types, and . interacting neuronal popula- 
tions), is tiirough graph tiieory {17 J). The ele- 
ments of the system can be represented by the 
vertices of complex topographies, with the edg- 
es representing the interactions between thenx 
Examination of large networks reveals that they 
can self-organize, but more important, they can 
be particularly robust This robustness is not 
due to redundancy, but is a property of inho- 
mogeneously wired networks. The error toler- 
ance of such networks comes with a price; they 
are vulnerable to the selection or removal of a 
few riodes that contribute disproportionately to 
network stability. Gene, knockouts provide an 
illustration. Some knockouts may have minor 
effects, whereas others have catastrophic effects 
on the system. In the case of vimentin, a sup- 
posedly critical component of the cytoplasmic 
intermediate filament network of mammals, the 
knockout of the gene in mice reveals them to be 
reproductively normal, with no obvious pheno- 
typic effects {172), and yet the usually conspic- 
uous vimentin network is completely absent 
On tiie otiier hand, --30% of knockouts in 
Drosophila and mice correspond to critical 
nodes whose reduction in gene product, or total 
elimination, causes the network to crash most 
of the time, although even in some of these 
cases, phenotypic normalcy ensues, given the 
appropriate genetic backgrounds Thus, there are 
no "good" genes or **bad" genes, but only net- 
works that exist at various levels and at differ- 
ent connectivities, and at different states of 
sensitivity to perturbation. Sophisticated math- 
ematical analysis needs to be constantiy evalu- 
ated against hard biological data sets tiiat spe- 
cifically address networic dynamics. Nowhere is 
this more critical than in attempts to come to 
grips with "complexity," particularly because 
deconvoluting and correcting complex net- 
works that have undergone perturbation, and 
have resulted in human diseases, is the greatest 
significant challenge now facing us. 

It has been predicted for the last 15 years 
that complete sequencing of the human ge- 
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nome would open up new strategies for hu- 
man biological research and would have a 
major impact on medicine, and through med- 
icine and public health, on society. Effects on 
biomedical research are ateady being felt. 
This assembly of the human genome se- 
quence is but a first, hesitant step on a long 
and exciting journey toward understanding 
the role of the genome in human biology. It 
has been possible only because of innova- 
tions in instrumentation and sofhvare that 
have allowed automation of ahnost every step 
of the process from DNA preparation to an- 
notation. The next steps are clean We must 
. define the complexity that ensues when this 
relatively modest set of about 30,000 genes is 
expressed. The sequence provides the frame- 
work upon which all the genetics, biochem- 
istry, physiology, and ultimately phenotype 
depend. It provides the boundaries for scien- 
tific inquiry. The sequence is only the first 
level of understanding of the genome. All 
genes and their control elements must be 
identified; their functions, in concert as well 
as in isolation, defined; their sequence varia- 
tion woridwide described; and the relation 
. . between genome variation and specific phe- 
notypic characteristics determined. Now we 
know what we have to explain. 

Another paramount challenge awaits: 
. public discussion of this information and its 
potential for improvenrient of personal health. 
Many diverse- sources of data have shown 
that any two individuals are more than 99.9% 
identical in sequence, which means that all 
the glorious differences among individuals in 
our species that can be attributed to genes 
falls in a mere 0.1% of the sequence. There 
are two fallacies to be avoided: determinism, 
the idea that all characteristics of the person 
are "hard-wired" by the genome; and reduc- 
tionism, the view that with complete knowl- 
edge of the human genome sequence, it is 
only a matter of time before our understand- 
ing of gene functions and interactions will 
provide a complete causal description of hu- 
man variability. The real challenge of human 
biology, beyond the task of finding out how 
genes orchestrate the construction and main- 
tenance of the miraculous mechanism of our 
bodies, will lie ahead as we seek to explain 
how our minds have come to organize 
thoughts sufficiently well to investigate our 
own existence. 
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tDITORIAL 




THE HUMAN 
GENOME 



A historic 
moment for 
the scientific 
endeavor. 



; umanity has been given a great gift. With the completion of the human 
genome sequence, we have received a powerful tool for unlocking the 
secrets of our genetic heritage and for finding our place among the other 
participants in the adventure of life. 

This week's issue of Science contains the report of the seqttencing of 
the human genome from a group of authors led by Craig Venterof Celera 
Genomics. The report of the sequencing of the human genome from the 
publicly funded consortium of laboratories led by Francis Collins appears 
in this week's Nature, This stunning achievement has been portrayed— 
often unfairly — as a competition between two 
ventures, one public and one private. That characterization detracts from 
the awesome accomplishment jointly unveiled this week. In truth, each 
project contributed to the other. The inspired vision that launched the 
publicly fimded project roughly 10 years ago reflected, and now rewards, 
the confidence of those who believe that the pursuit of large-scale funda- 
mental problems in the life sciences is in the national interest The technical 
innovation and drive of Craig Venter and his colleagues made it possible 
to celebrate this accomplishment far sooner than was believed possible. 
Thus, we can salute what has become, in the end, not a contest but a 
marriage (perhaps encouraged by shotgun) between public funding and 
private entrepreneunhip. 

There are excellent scientific reasons for applauding an outcome that . . 
has given us two winners. Two sequences are better than one; the opportunity for comparison and con^ 
vergence is invaluable. Indeed, a real-world proof of the importance of access to both sets ofdata can 
be found in the pages of this issue of Science, in the comparative analysis by Olivier e/ aL (p. 1298). 

Although we have made the point before, it is worth repeating that the sequencing of the human 
genome represents, not an ending, but the beginning of a new approach to biology. As Galas say^ m 
his Viewpoint (p. 1257), the knowledge that all of the genetic components of any process can be 
identified will give extraordinary new power to scientists. Because of this breakthrough, reseaiich 
can evolve from analyzingihe effects of mdividual genes to a more integrated view that exanames 
whole ensembles of genes as they interact to form a living human being. Several articles m this issue 
highlight how this approach is aheady beginning to revolutionize the way we look at human disease. 

This has been a massive project, on a scale unparalleled in the history of biology, but of cour^ 
it has built on the scientific insights of centuries of investigators. By coincidence, this landmaric 
announcement falls during the week of the anniversary of the birth of Charles Darwm. Darwm s 
message that the survival of a species can depend on its ability to evolve m the face of change is 
peculiarly pertinent to discussions that have gone on in the past year over access to the Celera data. 
(Full information regarding the agreements that were reached to make the data avaUable can ^e 
found at www.sciencemag.org/feature/data/announcement/gsp.shl.) AVe are willing to be flexible m 
allowing data repositories other than the traditional GenBank, while insisting on access to aU the. 
data needed to verify conclusions. In this domain, change is everywhere: Commercial researchers 
are producing more and more potentially valuable sequences, yet (at least in the Umted States) 
laws governing databases provide scant protection against piracy. Had the Celera data been kept se- 
cret, it would have been a serious loss to the scientific community. We hope that our adaptabibty m 
the face of change will enable other proprietary data to be published after peer review, m a way t{iat 
satisfies our continuing commitment to fiill access. • - , /jL *\i • 

It should be no surprise that an achievement so stunning, and so carefiilly watched, has created 
new challenges for the scientific venture. Science is proud to have played a role in brmging this 
discovery onto the public stage. It is literally true that this is a historic moment for the scientific en- 
deavor. The human genome has been called the Book of Life. Rather, it is a Ubrary, in which, with 
rules that encourage exploration and reward creativity, we can fmd many of the books that v^oll 
help defme us and our place in the great tapestry of life. 

Barbara R. Jasny and Donald Kennedy 
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AC114490 152428 bp DNA linear PRI 30-APR-2002 

Homo sapiens chromosome 1 clone RP11-244H3, complete sequence. 
AC114490 AL354876 
AC114490 .2 GI: 20340495 
HTG. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 

1 (bases 1 to 152428) 
Kaul,R.K., Olson, M. v., Zhou,Y. 
Saenphimmachak , C . , Phe Ips , K . A . 
Haugen,E.D. 
Direct Submission 
Unpublished 

2 (bases 1 to 152428) . 

Kaul,R,K., Olson, M. v., Raymond, C. and Haugen , E . D . 
Direct Submission 

Submitted (09-MAR-2002) Genome Center, University of Washington, 
Box 352145, Seattle, WA 98195, USA 
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Kaul,R.K., Olson, M. v., Zhou,Y. 
Saenphimmachak, C. , Phelps, K. A. 
Haugen , E . D . 
Direct Submission 

Submitted (30-APR-2002) Genome Center, University of Washington, 
Box 352145, Seattle, WA 98195, USA 

On Apr 30, 2002 this sequence version replaced gi:19310309. 
Genome Center 

Center: University of Washington Genome Center 

Center Code: UWGC 
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Contact: uwgchtgs@u. Washington. edu 

Drafting Center: SC 
Project Information 

Center project name: chr-1 
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Summary Statistics 

Sequencing vector: plasmid; 31% of reads 

Sequencing vector: plasmid; L08752; 69% of reads 
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Assembly program: Phrap; version 0.990319 
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identify and encode novel protein kinases (KIN) expressed 
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HUMAN KINASE HOMOLOGS 
HELD OF THE INVENTION 

The present invention is in the field of molecular biology; 
more particularly, the present invention describes nucleic 
acid sequences for novel human kinase homologs. 

BACKGROUND OF THE INVENTION 

Kinases regulate many different cell proliferation, 
differentiation, and signalling processes by adding phos- 
phate groups to proteins. Uncontrolled signalling has been 
implicated in inflammation, oncogenesis, arteriosclerosis, 
and psoriasis. Reversible protein phosphorylation is the 
main strategy for controlling activities of eukaryotic cells. It 
is estimated that more than 1000 of the 10,000 proteins 
active in a typical mammalian cell are phosphorylated. The 
high energy phosphate which drives activation is generally 
transferred from adenosine triphosphate molecules (AIT) to 
a particular protein by protein kinases and removed from 
that protein by protein phosphatases. 

Phosphorylation occurs in response to extracellular sig- 
nals (hormones, neurotransmitters, growth and differentia- 
tion factors, etc), cell cycle checkpoints, and environmental 
or nutritional stresses and is roughly analogous to the 
turning on a molecular switch. When the switch goes on, the 
appropriate protein kinase activates a metaboUc enzyme, 
regulatory protein, receptor, cytoskeletal protein, ion chan- 
nel or pump, or transcription factor. 

The kinases comprise the largest known protein family, a 
superfamily of enzymes with widely varied functions and 
specificities. They are usually named after their substrate, 
their regulatory molecules, after some aspect of a mutant 
phenotype or arbitrarily. Almost all kinases contain a similar 
250-300 amino acid catalytic domain. The N-terminal 
domain, which contains subdomains I-IV, generally folds 
into a two-lobed structure and binds and orients the ATP (or 
OTP) donor molecule. The larger C terminal lobe, which 
contains subdomains VIA-XI, binds the protein substrate 
and carries out the transfer of the gamma phosphate from 
ATP to the hydroxyl group of a serine, threonine, or tyrosine 
residue. Subdomain V spans the two lobes. 

The kinases may be categorized into families by the 
different amino acid sequences (generally between 5 and 
100 residues) located on either side of, or inserted into loops 
of, the kinase domain. These added amino acid sequences 
allow the regulation of each kinase as it recognizes and 
interacts with its target protein. The primary structure of the 
kinase domains is conserved and can be further subdivided 
into 12 subdomains. The following residues are relatively 
(-95%) invariant: G50 and G52 in subdomain 1, K72 in 
subdomain II, Gg^ in subdomain III, E208 in subdomain VIII, 
D220 and G225 in subdomain IX, and the motifs or patterns 
of amino acids in subdomains VIB, VIII and IX (Hardie G. 
and Hanks S. (1995) The Protein Kinase Facts Books, I and 
II, Academic Press, San Diego, Cahf.). 

The cyclin dependent protein kinase (cdk) family includes 
proteins which are turned on and off as the cell proceeds 
through the cell cycle. A cdk is active as a kinase only when 
it is bound to a cyclin. Cdk activation simultaneously 
requires both the addition of a high energy phosphate to a 
threonine residue by a kinase and the removal of a 
covalently-bound phosphate from a specific tyrosine residue 
by a phosphatase. The concentration of some cyclins rises 
gradually through a particular part of the cell cycle until their 
targeted proteolysis ends the coordinated interaction among 
the cyclin, kinase, and phosphatase molecules. 
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The second-messenger dependent protein kinases prima- 
rily mediate the effects of second messengers such as cyclic 
AMP (cAMP) cyclic GMP, inositol triphosphate, 
phosphatidylinosilol, 3, 4 ,5 -triphosphate, cyclic ADPribose, 

5 arachidonic acid and diacylglycerol. For purposes of 
example, the structure and function of cyclic AMP- 
dependent protein kinase (A-kinase) will be described. 
Mammalian cells generally contain at least two forms of 
A-kinase; type 1 which is cytosolic, and type 2 which is 
bound to plasma membrane, nuclear membrane or microtu- 
bules. In its inactive state, A-kinase consists of a complex of 
two catalytic subunits and two regulatory subunits. When 
each regulatory subunit has bound two molecules of cAMP, 
the catalytic subunit is activated and can transfer a high 

J J energy phosphate from ATP to the serine or threonine of a 
substrate protein. Substrate proteins are usually marked by 
the presence of two or more basic amino acids on their 
amino terminal sides. A-kinase is important in metabolism 
of glycogen, for inactivalion of phosphatase inhibitor 
protein, in transcription of genes which contain a regulatory 
region called the cAMP response element (CRE), and in 
regulation of the ion channels of olfactory neurons. 

Protein kinase C (PKC) is a water-soluble, Ca"^"^- 
dependent kinase, commonly found in brain tissue, which 

25 moves to the plasma membrane in the presence of Ca"^ ions. 
Approximately half of the known isoforms of PKC are 
activated initially by diacylglycerol and phosphatidylserine. 
Prolonged activation of PKC depends on continued produc- 
tion of diacyglycerol molecules which are formed when 

3Q phosphohpases cleave phosphatidylcholine. In nerve cells, 
PKC phosphorylates ion channels and alters the excitability 
of the cell membrane. In other cells, activation of PKC 
increases gene transcription either by triggering a protein 
kinase cascade which activates a regulatory element (much 

35 like CRE above) or by phosphorylating and deactivating an 
inhibitor of the regulatory protein. 

Ca*Vcalmodulin-dependent protein kinases (CaM- 
kinascs) mediate most of the actions of Ca** in human cells. 
The CaM-kinases include enzymes with narrow substrate 

40 specificity such as myosin light chain kinase which activates 
smooth muscle contraction and phosphorylasc kinase which 
activates glycogen breakdown and the multifunctional 
enzyme, CaM-kinase II which is found in all cells. Phos- 
phorylase kinase has four subunits: y is the catalytic moiety 

45 and a, P and are regulatory. Since subunits a and p are 
phosphorylated by A-kinase and subunit □& is Ca"^/ 
calmodulin, glycogen breakdown can be activated by either 
cAMP or Ca**. 

CaM-kinase II is particularly enriched in catecholamine 

50 synapses. In those neurons, Ca** influx stimulates both the 
release of dopamine, noradrenaline or adrenaline and also 
their resynthesis through the activation of CaM-kinase II. 
Although the main role of CaM-kinase II is phosphorylation 
of tyrosine hydroxylase, the rate-limiting enzyme of cat- 

55 ccholamine synthesis, CaM-kinase II also autophosphory- 
lates and remains active until phosphotases overwhelm it. 

Transmembrane protein-tyrosine kinases are receptors for 
most growth factors. The first characterized receptor for 
epidermal growth factor (EGF) is a single pass transmem- 

60 brane protein of about 1200 amino acids with an extracel- 
lular glycosylated portion that interacts with the 53 amino 
acid EGF molecule. Binding activates the transfer of a 
phosphate group from ATP to selected tyrosine side chains 
of the receptor and other specific proteins. Other protein 

65 receptors with similar structure include the following growth 
and differentiation factors (GF) — platelet derived GF, fibro- 
blast GF, hepatocyte GF, insulin and insulin-like GFs, nerve 
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GF, vascular endothelial GF, macrophage colony stimulating 
factor, etc. Each protein phosphorylates itself by receptor 
dimerization to initiate the intracellular signalling cascade. 

Many protein-tyrosioe kinases lack transmembrane 
regions and form a complex with the intercellular regions of ^ 
other cell surface receptors. The best known NR-PTKs are 
the Src kinase family (Src, Yes, Fgr, Fyn, Lck, Lyn, Hck, 
Blk, etc) and the Janus kinase family (Jakl, Jak2, Jak3, 
Tyk2, etc). The Src FFKs are located on the cytoplasmic side 
of the plasma membrane and are characterized by Src 
homology regions 2 and 3 (SH2 and SH3). Src PTKs 
recognize short peptide motifs bearing phosphotyrosine or 
proline residues, respectively, and mediate protein-protein 
interactions that regulate a whole range of intracellular 
signalling molecules. Janus PTKs contain PTK or PTK-likc 15 
domains and interact with growth hormone, prolactin, and 
some of the same cytokine receptors as Src PTKs. The 
cytokine receptors are unique both in their ability to recruit 
multiple PTKs and in the diversity of their intracellular 
domains which allow flexibility in their responses within 20 
different cell types (Taniguchi T. (1995) Science 
268:251-55). Src and Jak kinases were first identified as the 
products of mutant oncogenes in cancer cells where their 
activation was no longer subject to normal cellular controls. 

Extracellular signalling proteins such as transforming 
growth factor-p (TGF-p), activins, bone morphogenelic 
protein, and related members of the TGF-p superfamily 
interact with receptor serine/threonine kinases. Like EGF 
above, these receptor kinases have a single pass transmem- 
brane domain with a serine/threonine kinase residue on the 
cytosolic side of the plasma membrane. The signalling 
pathways which are activated by binding the extracellular 
signalling molecules are presently under investigation. 

Mitogen-activated protein (MAP) kinases also regulate 
intracellular signalling pathways. ITiey mediate signal trans- 
duction from cell surface to nuclei via phosphorylation 
cascades. Several subgroups have been identified, and each 
manifests different substrate specificities and responds to 
distinct extracellular stimuli (Egan S. E. and Weinberg R. A. 
(1993) Nature 365:781-783). 

MAP kinase signalling pathways are present in mamma- 
lian cells as well as in yeast. The extracellular stimuli which 
activate mammalian pathways include epidermal growth 
factor (EOF), ultraviolet light, hyperosmolar medium, heat 45 
shock, endotoxic lipopolysaccharide (LPS), and pro- 
inflammatory cytokines such as tumor necrosis factor (TNF) 
and interleukin-1 (IL-1). In Saccharomyces cerevisiae, 
exposure to mating pheromone or hyperosmolar environ- 
ments activate the various MAP kinase signalling pathways, jq 

Mammalian cells have at least three subgroups of MAP 
kinases (Derijard B. et al (1995) Science 267:682-5), each 
distinguished by a tripeptide motif. They are extracellular 
signal -regulated protein kinases (ERK) characterized by 
Thr-Glu-Tyr; c-Jun amino-terminal kinases (JNK) charac- 55 
terized by Thr-Pro-Tyr; and p38 kinase characterized by 
Thr-Gly-Tyr. Each subgroup is activated by dual phospho- 
rylation of threonine and tyrosine residues by MAP kinase 
kinases located upstream of the phosphorylation cascade. 
Activated MAP kinases, in turn, phosphorylate downstream go 
effectors ultimately leading to intracellular changes. 

The ERK signal transduction pathway is activated via 
tyrosine kinase receptors on the plasmalemma. When 
growth factors bind to tyrosine, they bind to noncatalylic, 
Src homology (SH) adaptor proteins (SH2-SH3-SH2) and a 65 
guanine nucleotide releasing protein (GNTIP). GNRP 
reduces OTP and activates Ras proteins, members of the 
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large family of guanine nucleotide binding proteins 
(G-proteins). Activated Ras proteins bind to a protein kinase 
C-Raf-1 and activate the Raf-1 proteins. The activated Raf-1 
kinase subsequently phosphorylates MAP kinase kinase 
(MKK) which, in turn, activate ERKs. 

ERKs are proline-directed protein kinases which phos- 
phorylate Ser/Thr-Pro motifs. In fact, cytoplasmic phospho- 
lipase A2 (cPLA2) and transcription factor Elk-1 are sub- 
strates of ERKs. The ERKs phosphorylate Serjos of cPLA2 
thereby increasing its enzymatic activity and resulting in 
release of arachidonic add and the formation of lysophos- 
pholipids from membrane phospholipids. Likewise, phos- 
phorylation of the transcription factor Elk-1 by ERK ulti- 
mately increases transcriptional activity. 

JNK is distantly related to the ERK and is similarly 
activated by dual phosphorylation of Thr and Tyr and by 
MKK4 (Davis R (1994) TIBS 19:47CM73). The JNK signal 
transduction pathway is also initiated by ultraviolet light, 
osmotic stress, and the' pro-inflammatory cytokines, TNF 
and IL-1. Phosphorylation of Serg3 and Ser^j in the NH2- 
terminal domain of the transcription factor c-Jun increases 
transcriptional activity. 

p38 is a 41 kD protein containing 360-amino acids. Its 
dual phosphorylation is activated by the MKK3 and MKK4, 
heat shock, hyperosmolar medium, IL-1 or LPS endotoxin 
(Han J. et al (1994) Science 265:808^11). Sepsis produced 
by LPS is characterized by fever, chills, tachypnea, and 
tachycardia, and severe cases may result in septic shock 
which includes hypotension and multiple organ failure. 

Cells respond to LPS as a stress signal because it alters 
normal cellular processes and induces the release of sys- 
temic mediators such as TNF. CD14 is a 
glycosylphosphatidyl-inositol -anchored membrane glyco- 
protein which serves as a LPS receptor on the plasmalemma 
of monocytic cells. The binding of LPS to CD 14 causes 
rapid protein tyrosine phosphorylation of the 44- and 42-/ 
40-kD isoforms of MAP kinases. Although they bind LPS, 
these MAP kinase isoforms do not appear to belong to the 
p38 subgroup. 

An detailed understanding of kinase pathways and signal 
transduction is beginning to reveal some mechanisms for 
interceding in the progression of inflammatory illnesses and 
of uncontrolled cell proliferation. The cDNAs, 
oligonucleotides, peptides and antibodies for the human 
kinases, which are the subject of this invention and are listed 
in Table 1, provide a plurality of tools for studying signalling 
cascades in various cells and tissues and for diagnosing and 
selecting inhibitors or drugs with the potential to intervene 
in various disorders or diseases in which altered kinase 
expression is implicated. The disorders or diseases include, 
but not limited to, human X-linked agammaglobulinemia, 
nonspherocytic hemolytic anemia, atherosclerosis, carcino- 
mas (breast, ovary, renal, squamous cell and prostate), 
diabetes, gliomas, glomerular disease, hepatomegaly, Kar- 
posi's sarcoma, lymphoblastic and myelogenous leukemias, 
myoglobinuria, peptic ulcer disease, psoriasis, pulmonary 
fibrosis, restenosis, and septic shock due to cholera, 
Clostridium difficile, E. coli and Shigella (Isselbacher K. J. 
et al (1994) Harrison's Principles of Internal Medicine, 
McGraw-Hill, New York City; Levitzki A. and A. Gazit 
(1995) Science 267:1782-88). 

SUMMARY OF THE INVENTION 

The subject invention provides unique polynucleotides 
(SEQ ID NOs 1-44) which have been identified as novel 
human kinases (kin). These partial cDNAs were identified 
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amoag the polynucleotides which comprise various Incyte 
cDNA libraries. 

The invention comprises polynucleotides which are 
complementary to the kin sequences (SEO ID Nos 1-44). 

The invention also comprises the use of kin sequences to 
identify and obtain a full length human kinase cDNAs such 
as SEQ ID NO 45. 

The invention further comprises the use of oligomers 
from these kin sequences in a kinases kit which can be used 
to identify a disorder or disease with altered kinase expres- 
sion and provide a method for monitoring progress of a 
patient during drug therapy. 

Aspects of the invention include use of kin sequences or 
recombinant nucleic acids derived from them to produce 
purified peptides. Still further aspects of the invention use 
these purified peptides to identify antibodies or other mol- 
ecules with inhibitory activity toward a particular kinase, 
group of kinases or disease. 

In addition, the invention comprises the use of kin specific 
antibodies in assays to identify a disorder or disease with 
altered kinase expression and provides a method to monitor 
the progress of a patient during drug therapy. 

DESCRIPTION OF THE FIGURE 

FIGS. lA and IB display the full length nucleotide 
sequence for human MAP kinase from stomach tissue (SEQ 
ID NO 45; Incyte Clone 214915E) and its predicted amino 
acid sequence, 

DETAILED DESCRIPTION OF THE 
INVEN^nON 

Definitions 

As used herein, the abbreviation for kinase in lower case 
(kin) refers to a gene, cDNA, RNAor nucleic acid sequence 
while the upper case version (KIN) refers to a protein, 
polypeptide, peptide, oligopeptide, or amino acid sequence. 

An "oligonucleotide" or "oligomer" is a stretch of nucle- 
otide residues which has a sufficient number of bases to be 
used in a polymerase chain reaction (PCR). These short 
sequences are based on (or designed from) genomic or 
cDNA sequences and are used to amphfy, confirm, or reveal 
the presence of an identical, similar or complementary DNA 
or RNA in a particular cell or tissue. Oligonucleotides or 
oligomers comprise portions of a DNA sequence having at 
least about 10 nucleotides and as many as about 50 
nucleotides, preferably about 15 to 30 nucleotides. They are 
chemically synthesized and may be used as probes. 

"Probes" arc nucleic acid sequences of variable length, 
preferably between at least about 10 and as many as about 
6,000 nucleotides, depending on use. They are used in the 
detection of identical, similar, or complementary nucleic 
acid sequences. Longer length probes are usually obtained 
from a natural or recombinant source, are highly specific and 
much slower to hybridize than oligomers. ITiey may be 
single- or double-stranded and carefully designed to have 
specificity in PCR, hybridization membrane-based, or 
ELISA-Hke technologies. 

"Reporter" molecules are chemical moieties used for 
labelling a nucleic or amino acid sequence. They include, 
but are not limited to, radionuclides, enzymes, fluorescent, 
chemi-luminescenl, or chromogenic agents. Reporter mol- 
ecules associate with, establish the presence of, and may 
allow quantification of a particular nucleic or amino acid 
sequence, 

A "portion" or "fragment" of a polynucleotide or nucleic 
acid comprises all or any part of the nucleotide sequence 
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having fewer nucleotides than about 6 kb, preferably fewer 
than about 1 kb which can be used as a probe. Such probes 
may be labelled with reporter molecules using nick 
translation, Klenow fill-in reaction, PCR or other methods 

5 well known in the art. After pretesting to optimize reaction 
conditions and to eliminate false positives, nucleic acid 
probes may be used in Southern, northern or in situ hybrid- 
izations to determine whether DNA or RNA encoding the 
protein is present in a biological sample, cell type, tissue, 

10 organ or organism. 

"Recombinant nucleotide variants" are polynucleotides 
which encode a protein. They may be synthesized by making 
use of the "redundancy" in the genetic code. Various codon 
substitutions, such as the silent changes which produce 

15 specific restriction sites or codon usage-specific mutations, 
may be introduced to optimize cloning into a plasmid or 
viral vector or expression in a particular prokaryotic or 
eukaryotic host system, respectively. 

"Linkers" are synthesized palindromic nucleotide 

20 sequences which create internal restriction endonuclease 
sites for ease of cloning the genetic material of choice into 
various vectors. "Poly linkers" are engineered to include 
multiple restriction enzyme sites and provide for the use of 
both those enzymes which leave 5' and 3' overhangs such as 

25 BamHI, EcoRI, PstI, Kpnl and Hind III or which provide a 
blunt end such as EcoRV, SnaBI and Stul. 

"Control elements" or "regulatory sequences" are those 
nontranslated regions of the gene or DNA such as enhancers, 
promoters, introns and 3' untranslated regions which interact 

30 with cellular proteins to carry out replication, transcription, 
and translation. They may occur as boundary sequences or 
even split the gene. ITiey function at the molecular level and 
along with regulatory genes are very important in 
development, growth, differentiation and aging processes. 

35 "Chimeric" molecules are polynucleotides or polypep- 
tides which are created by combining one or more of 
nucleotide sequences of this invention (or their parts) with 
additional nucleic acid sequcnce(s). Such combined 
sequences may be introduced into an appropriate vector and 

40 expressed to give rise to a chimeric polypeptide which may 
be expected to be different from the native molecule in one 
or more of the following kinase characteristics; cellular 
location, distribution, ligand-binding afiBnities, interchain 
affinities, degradation/turnover rate, signalling, etc. 

45 "Active" is that state which is capable of being useful or 
of carrying out some role. It specifically refers to those 
forms, fragments, or domains of an amino acid sequence 
which display the biologic and/or immunogenic activity 
characteristic of the naturally occurring kinase, 

50 "Naturally occurring KIN" refers to a polypeptide pro- 
duced by cells which have not been genetically engineered 
or which have been genetically engineered to produce the 
same sequence as that naturally produced. Specifically con- 
templated are various polypeptides which arise from post- 
55 transnational modifications. Such modifications of the 
polypeptide include but are not limited to acetylation, 
carboxylation, glycosylation, phosphorylation, lipidation 
and acylation. 

"Derivative" refers to those polypeptides which have been 
60 chemically modified by such techniques as ubiquitination, 
labelling (see above), pegylation (derivatization with poly- 
ethylene glycol), and chemical insertion or substimtion of 
amino acids such as ornithine which do not normally occur 
in human proteins. 
65 "Recombinant polypeptide variant" refers to any polypep- 
tide which differs from natm-ally occurring KIN by amino 
acid insertions, deletions and/or substitutions, created using 
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recombinant DNA techniques. Guidance in determining 
which amino acid residues may be replaced, added or 
deleted without abolishing characteristics of interest may be 
found by comparing the sequence of KIN with that of related 
polypeptides and minimizing the number of amino acid 
sequence changes made in highly conserved regions. 

Amino acid "substitutions" are defined as one for one 
amino acid replacements. They are conservative in nature 
when the ^bstituted amino acid has similar structural and/or 
chemical properties. Examples of conservative replacements 
are substitution of a leucine with an iso leucine or valine, an 
aspartate with a glutamate, or a threonine with a serine. 

Amino acid "insertions" or "deletions" are changes to or 
within an amino acid sequence. They typically fall in the 
range of about 1 to 5 amino acids. The variation allowed in 
a particular amino acid sequence may be experimentally 
determined by producing the peptide synthetically or by 
systematically making insertions, deletions, or substitutions 
of nucleotides in the kin sequence using recombinant DNA 
techniques. 

A "signal or leader sequence" is a short amino acid 
sequence which or can be used, when desired, to direct the 
polypeptide through a membrane of a cell. Such a sequence 
may be namrally present on the polypeptides of the present 
invention or provided from heterologous sources by recom- 
binant DNA techniques. 

An "oligopeptide" is a short stretch of amino acid residues 
and may be expressed from an oligonucleotide. It may be 
functionally equivalent to and either the same length as or 
considerably shorter than a "fragment "portion ", or 
"segment" of a polypeptide. Such sequences comprise a 
stretch of amino acid residues of at least about 5 amino acids 
and often about 17 or more amino acids, typicaUy at least 
about 9 to 13 amino acids, and of sufiBcient length to display 
biologic and/or immunogenic activity. 

An "inhibitor" is a substance which retards or prevents a 
chemical or physiological reaction or response. Common 
inhibitors include but are not limited to antiscnse molecules, 
antibodies, antagonists and their derivatives. 

A "standard" is a quantitative or qualitative measurement 
for comparison. Preferably, it is based on a statistically 
appropriate number of samples and is created to use as a 
basis of comparison when performing diagnostic assays, 
running clinical trials, or following patient treatment pro- 
files. The samples of a particular standard may be normal or 45 
similarly abnormal. 

"Animal" as used herein may be defined to include 
human, domestic (cats, dogs, etc), agricultural (cows, 
horses, sheep, goats, chicken, fish, etc) or test species (frogs, 
mice, rats, rabbits, simians, etc). 

"Disorders or diseases" in which altered kinase activity 
have been impHcated specifically include, but are not limited 
to, human X-linked agammaglobulinemia, nonspherocytic 
hemolytic anemia, atherosclerosis, carcinomas (breast, 
ovary, renal, squamous cell and prostate), diabetes, gliomas, 
glomerular disease, hepatomegaly, Karposi's sarcoma, lym- 
phoblastic and myelogenous leukemias, myoglobinuria, 
peptic ulcer disease, psoriasis, pulmonary fibrosis, 
restenosis, and septic shock due to cholera, Clostridium 
difficile, E. coli and Shigella. 

Since the list of technical and scientific terms cannot be all 
encompassing, any undefined terms shall be constmed to 
have the same meaning as is commonly understood by one 
of skill in the art to which this invention belongs. 
Furthermore, the singular forms "a", "an" and "the" include 
plural referents unless the context cleariy dictates otherwise. 
For example, reference to a "restriction enzyme" or a "high 
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fidelity enzyme" may include mixtures of such enzymes and 
any other enzymes fitting the stated criteria, or reference to 
the method includes reference to one or more methods for 
obtaining cDNA sequences which will be known to those 
skilled in the art or will become known to them upon reading 
this specification. 

Before the present sequences, variants, formulations and 
methods for making and using the invention are described, 
it is to be understood that the invention is not to be limited 
only to the particular sequences, variants, formulations or 
methods described. The sequences, variants, formulations 
and methodologies may vary, and the terminology used 
herein is for the purpose of describing particular embodi- 
ments. The terminology and definitions are not intended to 
be limiting since the scope of protection will ultimately 
depend upon the claims. 

DESCRIPTION OF THE INVENTION 

The present invention provides for purified partial protein 
kinase cDNAs which were expressed in various human 
tissues and isolated therefi-om. These sequences were iden- 
tified by their similarity to published or known open reading 
frames or untranslated control regions. Since protein kinases 
are associated with basic cellular processes such as cell 
proliferation, differentiation and cell signalling, these nucle- 
otide sequences are useful in the characterization of and 
delineation of normal and abnormal processes. Kinase 
nucleotide sequences are useful in diagnostic assays used to 
evaluate the role of a specific kinase in normal, diseased, or 
therapeutically treated cells. 

Purified kinase nucleotide sequences have numerous 
applications in techniques known to those skilled in the art 
of molecular biology. These techniques include their use as 
hybridization probes, for chromosome and gene mapping, in 
PCR technologies, in the production of sense or antisense 
nucleic acids, in screening for new therapeutic molecules, 
etc. These examples are well known and are not intended to 
be limiting. Furthermore, the nucleotide sequences disclosed 
herein may be used in molecular biology techniques that 
have not yet been developed, provided the new techniques 
rely on properties of nucleotide sequences that are currently 
known, including but not limited to such properties as the 
triplet genetic code and specific base pair interactions. 

As a result of the degeneracy of the genetic code, a 
multitude of kinase-encoding nucleotide sequences may be 
produced and some of these will bear only minimal homol- 
ogy to the endogenous sequence of any known and naturally 
occurring kinase. This invention has specifically contem- 
plated each and every possible variation of nucleotide 
sequence that could be made by selecting combinations 
based on possible codon choices. These combinations are 
made in accordance with the standard triplet genetic code as 
applied to the nucleotide sequence of naturally occurring 
kinases, and all such variations are to be considered as being 
specifically disclosed. 

Ahhough the kinase nucleotide sequences and their 
derivatives or variants are preferably capable of identifying 
the nucleotide sequence of the naturally occurring kinase 
under optimized conditions, it may be advantageous to 
produce kinase-encoding nucleotide sequences possessing a 
substantially different codon usage. Codons can be selected 
to increase the rate at which expression of the peptide occurs 
in a particular prokaryotic or eukaryolic expression host in 
accordance with the frequency with which particular codons 
are utilized by the host. Other reasons for substantially 
altering the nucleotide sequence encoding the kinase without 
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altering the encoded amino acid sequence include the pro- Although the restriction and ligation reactions are carried 

duction of RNA transcripts having more desirable out simultaneously, the requirements for extension, immo- 

propcrties, such as a longer half-life, than transcripts pro- bilization and two rounds of PCR and purification prior to 

duced from the naturally occurring sequence. sequencing render the method cumbersome and time con- 

Nucleolide sequences encoding a kinase may be joined to 5 summg. 
a variety of other nucleotide sequences by means of well Parker J. D. et al (1991; Nucleic Adds Res 19:3055-60), 
established recombinant DNA techniques (Sambrook J. etal teach walking PCR, a method for targeted gene walking 
(1989) Molecular Cloning: A Laboratory Manual, Cold which permits retrieval of unknown sequence. Promoter- 
Spring Harbor Laboratory, Cold Spring Harbor, N.Y; or Finder™ is a new kit available from Qonlech (Palo Alto, 
Ausubel F. M. et al (1989) Current Protocols in Molecular Calif.) which uses PCR and primers derived from p53 to 
Biology, John Wiley & Sons, New York City). Usefijl walk in genomic DNA. Nested primers and special Promot- 
sequences for joining to the kinase include an assortment of erFinder libraries are used to detect upstream sequences 
cloning vectors such as plasmids, cosmids, lambda phage such as promoters and regulatory elements. This process 
derivatives, phagemids, and the like. Vectors of interest avoids the need to screen libraries and is useful in finding 
include vectors for replication, expression, probe generation, intron/exon junctions. 

sequencing, and the like. In general, vectors of interest may Another new PCR method, "Improved Method for 

contain an origin of replication functional in at least one Obtaining Full Length cDNA Sequences" by Guegler et al, 

organism, convenient restriction endonuclease sensitive patent application Sen No 08/487,112, filed Jun. 7, 1995 and 

sites, and selectable markers for one or more host cell hereby incorporated by reference, employs XL-PCR 
systems. 20 (Perkin -Elmer, Foster City, Calif.) to amplify and extend 

PCR as described in U.S. Pat. Nos. 4,683,195; 4,800,195; partial nucleotide sequence into longer pieces of DNA. ITiis 

and 4,965,188 provides additional uses for oligonucleotides method was developed to allow a single researcher to 

based upon the kinase nucleotide sequence. Such oligomers process multiple genes (up to 20 or more) at one time and to 

are generally chemically synthesized, but they may be of obtain an extended (possibly full-length) sequence within 

recombinant origin or a mixture of both. Oligomers gener- 6-10 days. This new method replaces methods which use 

ally comprise two nucleotide sequences, one with sense labelled probes to screen plasmid libraries and allow one 

orientation (5 '-►3') and one with antisense (3' to 5') researcher to process only about 3-5 genes in 14-40 days, 

employed under optimized conditions for identification of a jn the first step, which can be performed in about two 

specific gene or diagnostic use. The same two oligomers, days, any two of a plurality of primers are designed and 

nested sets of oligomers, or even a degenerate pool of synthesized based on a known partial sequence. In step 2, 

oligomers may be employed under less stringent conditions which takes about six to eight hours, the sequence is 

for identification and/or quantitation of closely related DNA extended by PCR amplification of a selected library. Steps 3 

or RNA sequences. and 4, which take about one day, are purification of the 

Full length genes may be cloned utilizing partial nucle- amplified cDNA and its ligation into an appropriate vector, 

otide sequence and various methods known in the art. Step 5, which takes about one day, involves transforming 

Gobinda et al (1993; PCR Methods Applic 2:318-22) dis- and growing up host bacteria. In step 6, which takes approxi- 

close "restriction-site PCR" as a direct method which uses mately five hours, PCR is used to screen bacterial clones for 

universal primers to retrieve unknown sequence adjacent to extended sequence. Tlie final steps, which take about one 
a known locus. First, genomic DNA is amplified in the ^ day, involve the preparation and sequencing of selected 

presence of primer to linker and a primer specific to the clones. 

known region. The amplified sequences are subjected to a if the full length cDNA has not been obtained, the entire 

second round of PCR with the same linker primer and procedure is repeated using either the original library or 

another specific primer internal to the first one. Products of some other preferred library. The preferred library may be 

each round of PCR are transcribed with an appropriate RNA one that has been size-selected to include only larger cDNAs 

polymerase and sequenced using reverse transcriptase. or may consist of single or combined commercially avail- 

Gobinda et al present data concerning Factor IX for which able libraries, eg. lung, liver, heart and brain from Gibco/ 

they identified a conserved stretch of 20 nucleotides in the bRL (Gaithersburg, Md.). The cDNA library may have been 

3' noncoding region of the gene. prepared with oligo (dT) or random priming. Random 

Inverse PCR is the first method to report successful 50 primed libraries are preferred in that they will contain more 

acquisition of unknown sequences starting with primers sequences which contain 5' ends of genes. A randomly 

based on a known region (Triglia T. et al (1988) Nucleic primed library may be particularly useful if an oligo (dT) 

Acids Res 16:8186). The method uses several restriction library does not yield a complete gene. It must be noted that 

enzymes to generate a suitable fragment in the known region the larger and more complex the protein, the less likely it is 
of a gene. Th& fragment is then circularized by inlramolecu- 55 that the complete gene wiU be found in a single plasmid. 

lar ligation and used as a PCR template. Divergent primers a new method for analyzing either the size or the nucle- 

are designed from the known region. The multiple rounds of otide sequence of PCR products is capillary electrophoresis, 

restriction enzyme digestions and ligations that are neces- Systems for rapid sequencing are available from Perkin 

sary prior to PCR make the procedure slow and expensive Elmer (Foster, City Calif.), Beckman Instruments (Fullerton, 
(Gobinda et al, supra). 60 Calif.), and other companies. Capillary sequencing employs 

Capture PCR (Lagerstrom M. et al (1991) PCR Methods flowable polymers for electrophoretic separation, four dif- 

Applic 1:111-19) is a method for PCR amplification of DNA ferent fluorescent dyes (one for each nucleotide) which are 

fragments adjacent to a known sequence in human and VAC laser activated, and detection of the emitted wavelengths by 

DNA. As noted by Gobinda et al (supra), capture PCR also a charge coupled devise camera. Output/hght intensity is 
requires multiple restriction enzyme digestions and ligations 65 converted to electrical signal using appropriate software (eg. 

to place an engineered double-stranded sequence into an Genotyper™ and Sequence Navigators™ from Perkin 

unknown portion of the DNA molecule before PCR. Elmer) and the entire process from loading of samples to 
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computer analysis and electronic data display is computer 
controlled. Capillary electrophoresis provides greater reso- 
lution and is many times faster than standard gel based 
procedures. It is particularly suited to the sequencing of 
small pieces of DNA which might be present in limited 
amounts in a particular sample. The reproducible sequenc- 
ing of up to 350 bp of M13 phage DNA in 30 min has been 
reported (Ruiz-Martinez M. C. et al (1993) Anal Chem 
65:2851^). 

Another aspect of the subject invention is to provide for 
kinase hybridization probes which are capable of hybridiz- 
ing with naturally occurring nucleotide sequences encoding 
kinases. The stringency of the hybridization conditions will 
determine whether the probe identifies only the native 
nucleotide sequence of that specific kinase or sequences of 
closely related molecules. If degenerate kinase nucleotide 
sequences of the subject invention are used for the detection 
of related kinase encoding sequences, they should preferably 
contain at least 50% of the nucleotides of the sequences 
presented herein. Hybridization probes of the subject inven- 
tion may be derived from the nucleotide sequences of the 
SEQ ID NOs 1-44, or from surrounding or included 
genomic sequences comprising untranslated regions such as 
promoters, enhancers and introns. Such hybridization probes 
may be labelled with appropriate reporter molecules. Means 
for producing specific hybridization probes for kinases 
include oligolabelling, nick translation, end-labelling or 
PCR amplification using a labelled nucleotide. Alternatively, 
the cDNA sequence may be cloned into a vector for the 
production of mRNA probe. Such vectors are known in the 
art, are commercially available, and may be used to synthe- 
size RNA probes in vitro by addition of an appropriate RNA 
polymerase such as T7, T3 or SP6 and labelled nucleotides. 
A number of companies (such as Pharmacia Biotech, 
Piscataway, NJ,; Promega, Madison, Wis.; US Biochemical 
Corp, Cleveland, Ohio; etc.) supply commercial kits and 
protocols for these procedures. 

It is also possible to produce a DNA sequence, or portions 
thereof, entirely by synthetic chemistry. Sometimes the 
source of information for producing this sequence comes 
from the known homologous sequence from closely related 
organisms. After synthesis, the nucleic acid sequence can be 
used alone or joined with a preexisting sequence and 
inserted into one of the many available DNA vectors and 
their respective host cells using techniques well known in 
the art. Moreover, synthetic chemistry may be used to 
introduce specific mutations into the nucleotide sequence. 
Alternatively, a portion of sequence in which a mutation is 
desired can be synthesized and recombined with a portion of 
an existing genomic or recombinant sequence. 

Ilie kinase nucleotide sequences can be used individually, 
or in panels, in a diagnostic test or assay to detect disorder 
or disease processes associated with abnormal levels of 
kinase expression. The nucleotide sequence is added to a 
sample (fluid, cell or tissue) from a patient under hybridizing 
conditions. After an incubation period, the sample is washed 
with a compatible fluid which optionally contains a reporter 
molecule which will bind the specific nucleotide. After the 
compatible fluid is rinsed off, the reporter molecule is 
quantitated and compared with a standard for that fluid, cell 
or tissue. If kinase expression is significantly different from 
the standard, the assay indicates the presence of disorder or 
disease. The form of such qualitative or quantitative meth- 
ods may include northern analysis, dot blot or other mem- 
brane based technologies, dip stick, pin or chip technologies, 
PCR, ELISAs or other multiple sample format technologies. 

This same assay, combining a sample with the nucleotide 
sequence, is applicable in evaluating the efficacy of a 
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particular therapeutic treatment regime. It may be used in 
animal studies, in clinical trials, or in monitoring the treat- 
ment of an individual patient. First, standard expression 
must be established for use as a basis of comparison. 

5 Second, samples from the animals or patients affected by the 
disorder or disease are combined with the nucleotide 
sequence to evaluate the deviation from the standard or 
normal profile, lliird, an existing therapeutic agent is 
administered, and a treatment profile is generated. The assay 
is evaluated to determine whether the profile progresses 
toward or returns to the standard pattern. Successive treat- 
ment profiles may be used to show the efficacy of treatment 
over a period of several days or several months. 

The nucleotide sequence for any particular kinase (SEQ 

J 5 ID NOs 1-45) can also be used to generate probes for 
mapping the native genomic sequence. The sequence may be 
mapped to a particular chromosome or to a specific region 
of the chromosome using well known techniques. These 
include in situ hybridization to chromosomal spreads 

20 (Venna et al (1988) Human Chromosomes: A Manual of 
Basic Techniques, Pergamon Press, New York City), flow- 
sorted chromosomal preparations, or artificial chromosome 
constructions such as yeast artificial chromosomes (YACs), 
bacterial artificial chromosomes (BACs), bacterial PI con- 

25 structions or single chromosome cDNA libraries. 

In situ hybridization of chromosomal preparations and 
physical mapping techniques such as linkage analysis using 
established chromosomal markers are invaluable in extend- 
ing genetic maps. Examples of genetic maps can be found in 

30 the 1994 Genome Issue of Science (265:1981f). Often the 
placement of a gene on the chromosome of another mam- 
mafian species may reveal associated markers even if the 
number or arm of a particular human chromosome is not 
known. New partial nucleotide sequences can be assigned to 

35 chromosomal arms, or parts thereof, by physical mapping. 
This provides valuable information to investigators search- 
ing for disease genes using positional cloning or other gene 
discovery techniques. Once a disease or syndrome, such as 
ataxia telangiectasia (AT), has been crudely localized by 

40 genetic linkage to a particular genomic region, for example, 
AT to llq22-23 (Gatti et al (1988) Nature 336:577-580), 
any sequences mapping to that area may represent genes for 
further investigation. The nucleotide sequences of the sub- 
ject invention may also be used to detect differences in the 

45 chromosomal location of nucleotide sequences due to 
translocation, inversion, etc. between normal and carrier or 
affected individuals. 

The partial nucleotide sequence encoding a particular 
kinase may be used to produce an amino acid sequence using 

50 well known methods of recombinant DNA technology. 
Goeddel (1990, Gene Expression Technology, Methods and 
Enzymology, Vol 185, Academic Press, San Diego, Calif.) is 
one among many publications which teach expression of an 
isolated, purified nucleotide sequence. The amino acid or 

55 peptide may be expressed in a variety of host cells, either 
prokaryotic or eukaryotic. Host cells may be from the same 
species from which the nucleotide sequence was derived or 
from a different species. Advantages of producing an amino 
acid sequence or peptide by recombinant DNA technology 

60 include obtaining adequate amounts for purification and the 
availability of simplified purification procedures. 

Cells transformed with a kinase nucleotide sequence may 
be culmred under conditions suitable for the expression and 
recovery of peptide from cell culture. The peptide produced 

65 by a recombinant cell may be secreted or may be contained 
intracellularly depending on the sequence itself and/or the 
vector used. In general, it is more convenient to prepare 
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recombinant proteins in secreted form, and this is accom- 
plished by ligating kin to a recombinant nucleotide sequence 
which directs its movement through a particular prokaryotic 
or eukaryotic oeU membrane. Other recombinant construc- 
tions may join Idn to nucleotide sequence encoding a 
polypeptide domain which will facilitate protein purification 
(KroU J. et al (1993) DNA CeU Biol 12:441-53). 

Direct peptide synthesis using solid-phase techniques 
(Stewart et al (1969) Solid-Phase Peptide Synthesis, WH 
Freeman Co, San Francisco, Calif.; Merrifield J. (1963) J 
Am Chem Soc 85:2149-2154) is an alternative to recom- 
binant or chimeric peptide production. Automated synthesis 
may be achieved, for example, using Applied Biosysleras 
431 A Peptide Synthesizer in accordance with the instruc- 
tions provided by the manufacturer. Additionally a particular 
kinase sequence or any part thereof may be mutated during 
direct synthesis and combined using chemical methods with 
other kinase sequence(s) or a pari thereof. This chimeric 
nucleotide sequence can also be placed in an appropriate 
vector and host cell to produce a variant peptide. 

Although an amino acid sequence or oligopeptide used for 
antibody induction does not require biological activity, it 
must be immunogenic. KIN used to induce specific anti- 
bodies may have an amino acid sequence consisting of at 
least five amino acids and preferably at least 10 amino acids. 
Short stretches of amino acid sequence may be fiised with 
those of another protein such as keyhole limpet hemocyanin, 
and the chimeric peptide used for antibody production. 
Alternatively, the oligopeptide may be of sufficient length to 
contain an entire domain. 

Antibodies specific for KIN may be produced by inocu- 
lation of an appropriate animal with an antigenic fragment of 
the peptide. An antibody is specific for KIN if it is produced 
against an epitope of the polypeptide and binds to at least 
part of the natural or recombinant protein. Antibody pro- 
duction includes not only the stimulation of an immune 
response by injection into animals, but also analogous 
processes such as the production of synthetic antibodies, the 
screening of recombinant immunoglobulin libraries for 
specific-binding molecules (Orlandi R. et al (1989) PNAS 
86:3833-3837, or Huse W. D. et al (1989) Science 
256:1275-1281), or the in vitro stimulation of lymphocyte 
populations. Current technology (Winter G. and Milstein C. 
(1991) Nature 349:293-299) provides for a number of 
highly specific binding reagents based on the principles of 
antibody formation. These techniques may be adapted to 
produce molecules which specifically bind kinase peptides. 
Antibodies or other appropriate molecules generated against 
a specific immunogenic peptide fragment or oligopeptide 
can be used in Western analysis, enzyme-linked immunosor- 
bent assays (ELISA) or similar tests to establish the presence 
of or to quantitate amounts of kinase active in normal, 
diseased, or therapeutically treated cells or tissues. 

The examples below are provided to illustrate the subject 
invention. These examples are provided by way of illustra- 
tion and are not included for the purpose of limiting the 
invention. 

EXAMPLES 
I cDNA Library Construction 

The kinase sequences of this application (Table 1) were 
first identified among the sequences comprising various 
libraries. Technology has advanced considerably since the 
first cDNA libraries were made. Many small variations in 
both chemicals and machinery have been instituted over 
time, and these have improved both the efficiency and safety 
of the process. Although the cDNAs could be obtained using 
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an older procedure, the procedure presented in this applica- 
tion is exemplary of one currently being used by persons 
skilled in the art. For the purpose of providing an exemplary 
method, the tissue preparation, mRNA isolation and cDNA 

5 library construction described here is for the rheumatoid 
synovium library from which the Incyte Clones 191283 and 
192268 for ser/thr kinases were obtained. 

Rheumatoid synovial tissue was obtained from the hip 
joint removed from a 68 year old female with erosive, 

10 nodular rheumatoid arthritis. Ilie tissue was frozen, ground 
to powder in a mortar and pestle, and lysed immediately in 
buffer containing guanidinium isothiocyanate. The lysate 
was centrifuged over a CsCl cushion (18 hrs at 25,000 rpm 
using a Beckman SW28 rotor and ultracentrifuge; Beckman 

15 Instruments, Palo Alto, CaUf ), ethanol precipitated, resus- 
pended in water and DNase treated for 15 min at 37*" C. The 
RNAwas extracted with phenol chloroform and precipitated 
with ethanol. Polyadenylated messages were isolated using 
Qiagen Oligotex (QIAGEN Inc, Chatsworth, Calif), and a 

20 custom cDNA library was constructed by Stratagene (La 
Jolla, Calif). 

First strand cDNA synthesis was accomplished using an 
oligo (dT^ primer/linker which also contained an Xhol 
restriction site. Second strand synthesis was performed 

25 using a combination of DNA polymerase I, E. coli ligase and 
RNase H, followed by the addition of an EcoRI linker to the 
blunt ended cDNA. The EcoRI linked, double-stranded 
cDNA was then digested with Xhol restriction enzyme, 
extracted with phenol chloroform, and fractionated by size 

30 on Sephacryl S400. DNA of the appropriate size was then 
ligated to dephosphorylated Lambda Zap® arms 
(Stratagene) and packaged using Gigapack extracts 
(Stratagene). pBluescript (Stratagene) phagemid DNAs 
were excised en masse from the library. 

35 In the alternative, DNAs were purified using Miniprep 
Kits (Catalog #77468; Advanced Genetic Technologies 
Corporation, Gaithersburg, Md.). These kits provide a 
96-well formal and enough reagents for 960 purifications. 
The recommended protocol supplied with each kit has been 

40 employed except for the following changes. First, the 96 
wells are each filled with only 1 ml of sterile Terrific broth 
(LIFE TECHNOLGIES"^", Gaithersburg, Md.) with carbe- 
nicillin at 25 mg/L (2xCarb) and glycerol at 0.4%. After the 
wells are inoculated, the bacteria are cultured for 24 hours 

45 and lysed with 60 fi\ of lysis buffer. A centrifugation step 
(2900 rpm for 5 minutes) is performed before the contents 
of the block are added to the primary filter plate. The 
optional step of adding isopropanol to TRIS buffer is not 
routinely performed. After the last step in the protocol, 

50 samples are transferred to a Beckman 96-well block for 
storage. 

II Sequencing of cDNA Clones 

The cDNA inserts from random isolates of the rheimaatoid 
synovium or other appropriate h*brary were sequenced in 

55 part. Methods for DNA sequencing are well known in the art 
and employ such enzymes as the Klenow fragment of DNA 
polymerase I, SEQUENASE® (US Biochemical Corp) or 
Taq polymerase. Methods to extend the DNA from an 
oligonucleotide primer annealed to the DNA template of 

60 interest have been developed for both single- and double- 
stranded templates. Chain termination reaction products 
were separated using electrophoresis and detected via their 
incorporated, labelled precursors. Recent improvements in 
mechanized reaction preparation, sequencing and analysis 

65 have permitted expansion in the number of sequences that 
can be determined per day. Preferably, the process is auto- 
mated with machines such as the Hamilton Micro Lab 2200 
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(Hamilton, Reno, Nev.), Peltier Thermal Cycler (PTC200; 
MJ Research, Watertown Mass,) and the Applied Biosys- 
tems Catalyst 800 and 377 and 373 DNA sequencers. 

The quality of any particular cDNA library may be 
determined by performing a pilot scale analysis of 192 
cDNAs and checking for percentages of clones containing 
vector, lambda or E. colt DNA, mitochondrial or repetitive 
DNA, and clones with exact or homologous matches to 
public databases. The number of unique sequences — those 
having no known match in any available database — were 
recorded. 

Ill Homology Searching of cDNA Clones and Their 
Deduced Proteins 

Each sequence so obtained was compared to sequences in 
GenBank using a search algorithm developed by Applied 
Biosystems and incorporated into the INHERIT™ 670 
Sequence Analysis System. In this algorithm, Pattern Speci- 
fication Language (TRW Inc, Los Angeles, Calif.) was used 
to determine regions of homology. The three parameters that 
determine how the sequence comparisons run were window 
size, window offset, and error tolerance. Using a combina- 
tion of these three parameters, the DNA database was 
searched for sequences containing regions of homology to 
the query sequence, and the appropriate sequences were 
scored with an initial value. Subsequently, these homolo- 
gous regions were examined using dot matrix homology 
plots to distinguish regions of homology from chance 
matches. Smith-Waterman alignments were used to display 
the results of the homology search. 

Peptide and protein sequence homologies were ascer- 
tained using the INHERIT*^" 670 Sequence Analysis System 
in a way similar to that used in DNA sequence homologies. 
Pattern Specification Language and parameter windows 
were used to search protein databases for sequences con- 
taining regions of homology which were scored with an 
initial value. Dot-matrix homology plots were examined to 
distinguish regions of significant homology from chance 
matches. 

Alternatively, BLAST, which stands for Basic Local 
Alignment Search Tool, is used to search for local sequence 
alignments (Altschul S. E (1993) J Mol Evol 36:290-300; 
Altschul, S. F et al (1990) J Mol Biol 215:403-10). BLAST 
produces aligiunents of both nucleotide and amino acid 
sequences to determine sequence similarity. Because of the 
local nature of the alignments, BLAST is especially useful 
in determining exact matches or in identifying homologs. 
While it is useful for matches which do not contain gaps, it 
is inappropriate for performing motif -style searching. The 
fundamental unit of BLAST algorithm output is the High- 
scoring Segment Pair (HSP). 

An HSP consists of two sequence fragments of arbitrary 
but equal lengths whose alignment is locally maximal and 
for which the alignmentBLAST approach is to look thresh- 
old or cutoff score set by the user. The BLAST approach is 
to look for HSPs between a query sequence and a database 
sequence, to evaluate the statistical significance of any 
matches found, and to report only those matches which 
satisfy the user-selected threshold of significance. The 
parameter E establishes the statistically significant threshold 
for reporting database sequence matches. E is interpreted as 
the upper bound of the expected frequency of chance 
occurrence of an HSP (or set of HSPs) within the context of 
the entire database search. Any database sequence whose 
match satisfies E is reported in the program output. 

AU the kinase molecules presented in this application 
were examined using INHERIT Although their identifica- 
tion was based on the criteria above, their homology to 
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known kinase molecules and name are subject to change 
when additional computer analysis against additional or 
more recent database information is employed. For example, 
whereas the first two kinases in Table 1 were initially 

5 identified as unique Incyte clones, homologous mouse and 
human kinases are now known. In other cases, additional 
sequence information has become available and its review 
against the known databases has precipitated a name change. 
Occasionally a clone number will also disappear from the 

10 LIFESEQ™ database (Incyte Pharmaceuticals Inc, Palo 
Alto, Calif.). This situation generally arises during the 
regular review of clones and assembly of contiguous 
sequences. 

IV Extension of cDNAs to Full Length 
15 The kinase sequences presented here can be used to 
design oligonucleotide primers for the extension of the 
cDNAs to full length. In fact, the partial map kinase cDNA 
sequence (SEQ ID NO 38) initially identified in Incyte clone 
214915 among the sequences comprising the human stom- 
20 ach cell library was extended to full length as shown in "A 
Novel Human Map Kinase Homolog" by Hawkins et al. 
Incyte Docket PF-036P, filed on Jun. 28, 1995, incorporated 
herein by reference. The coding region of this full length 
sequence (SEQ ID NO 45; Incyte Clone 214915E) begins at 
25 nucleotide 58 and ends at nucleotide 1156. 

Primers are designed based on known sequence; one 
primer is synthesized to initiate extension in the antisense 
direction (XLR) and the other to extend sequence in the 
sense direction (XLF). The primers allow the sequence to be 
30 extended "outward" generating amplicons containing new, 
unknown nucleotide sequence for the gene of interest. The 
primers may be designed using Oligo 4.0 (National Bio- 
sciences Inc, Plymouth, Minn.), or another appropriate 
program, to be 22-30 nucleotides in length, to have a GC 
35 content of 50% or more, and to anneal to the target sequence 
at temperatures about 68*'-72° C. Any stretch of nucleotides 
which would result in hairpin structures and primer-primer 
dimerizations was avoided. 
The stomach cDNA library was iised as a template, and 
40 XLR=AAG ACA TCC AGG AGC CCA ATG AC and 
XLF=AGG TGA TCC TCA GCT GGA TGC AC primers 
were used to extend and amplify the 214915 sequence. By 
following the instructions for the XL-PCR kit and thor- 
oughly mixing the enzyme and reaction mix, high fidelity 
45 amplification is obtained. Beginning with 25 pMol of each 
primer and the recommended concentrations of all other 
components of the kit, PCR is performed using the Peltier 
Thermal Cycler (PTC200; MJ Research, Watertown, Mass.) 
and the following parameters: 
50 Step 1 94** C. for 60 sec (initial denaturation) 
Step 2 94° C. for 15 sec 
Step 3 65" C. for 1 min 
Step 4 68° C. for 7 min 
55 Step 5 Repeat step 2-4 for 15 additional cycles 
Step 6 94° C, for 15 sec 
Step 7 65° C. for 1 min 
Step 8 68° C. for 7 min+15 sec/cycle 
Step 9 Repeat step 6-8 for 11 additional cycles 
^° Step 10 72° C. for 8 min 
Step 11 4° C. (and holding) 

At the end of 28 cycles, 50 of the reaction mix was 
removed; and the remaining reaction mix was run for an 
g5 additional 10 cycles as outlined below: 
Step 1 94° C. for 15 sec 
Step 2 65° C. for 1 min 
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Step 3 68" C. for (10 mm+15 sec)/cycle 
Step 4 Repeat step 1-3 for 9 additional cycles 
Step 5 72** C. for 10 min 

A 5-10 /A aliquot of the reaction mixture is analyzed by 
electrophoresis on a low concentration (about 0.6-0.8%) 
agarose mini -gel to determine which reactions were suc- 
cessful in extending the sequence. Although all extensions 
potentially contain a fiill length gene, some of the largest 
products or bands are selected and cut out of the gel. Further 
purification involves using a commercial gel extraction 
method such as QIAQuick™ (QIAGEN Inc). After recovery 
of the DNA, Klenow enzyme is used to trim single-stranded, 
nucleotide overhangs creating blunt ends which facilitate 
religation and cloning. 

After ethanol precipitation, the products are redissolved in 
13 fi\ of ligation buffer. Then, 1 ^\ T4-DNA ligase (15 units) 
and 1 fi\ T4 polynucleotide kinase are added, and the mixture 
is incubated at room temperature for 2-3 hours or ovemight 
at 16** C. Competent E. coli cells (in 40 ^\ of appropriate 
media) are transformed with 3 ^\ of ligation mixture and 
cultured in 80 ^\ of SOC medium (Sambrook J. et al, supra). 
After incubation for one hour at 37° C, the whole transfor- 
mation mixture is plated on Luria Bertani (LB)-agar 
(Sambrook J. et al, supra) containing 2xCarb. The following 
day, 12 colonies are randomly picked from each plate and 
cultured in 150 /il of liquid LB/2xCarb medium placed in an 
individual well of an appropriate, commercially-available, 
sterile 96-well microtiter plate. The following day, 5 /il of 
each ovemight culture is transferred into a non-sterile 
96-well plate and after dilution 1:10 with water, 5 fi\ of each 
sample is transferred into a PCR array. 

For PCR amplification, 15 /^l of concentrated PCR reac- 
tion mix (1.33x) containing 0.75 units of Taq polymerase, a 
vector primer and one or both of the gene specific primers 
used for the extension reaction are added to each well. 
Amplification is performed using the following conditions: 

Step 1 94** C. for 60 sec 

Step 2 94° C. for 20 sec 

Step 3 55° C. for 30 sec 

Step 4 72° C. for 90 sec 

Step 5 Repeat steps 2-4 for an additional 29 cycles 
Step 6 72° C. for 180 sec 
Step 7 4° C. (and holding) 

Aliquots of the PCR reactions are run on agarose gels 
together with molecular weight markers. The sizes of the 
PCR products are compared to the original partial cDNAs, 
and appropriate clones are selected, ligated into plasmid and 
sequenced. 

V Diagnostic Assays Using Kinase Specific Oligomers 

In those cases where a specific disorder or disease (see 
definitions supra) is suspected to involve altered quantities 
of a particular kinase, oligomers may be designed to estab- 
lish the presence and/or quantity of mRNA expressed in a 
biological sample. ITiere are several methods currently 
being used to quantitate the expression of a particular 
molecule. Most of these methods use radiolabelled (Melby 
R C, et al 1993 J Immunol Methods 159:235-44) or bioti- 
nylated (Duplaa C. et al 1993 Anal Biochem 229-36) 
nucleotides, coamplification of a control nucleic acid, and 
standard curves onto which the experimental results are 
interpolated. For example, phosphorylase B kinase defi- 
ciency may manifest as hepatomegaly which is inherited as 
either an X-linked or autosomal recessive trait or myoglo- 
binuria whose inheritance is unknown. 

Oligomers for phosphorylase B kinase are first used in 
quantitative PCR to establish a normal range for expression 
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of phosphorylase B kinase. Then, these same oligomers are 
used with extracts of cells from patients with inherited 
phosphorylase B kinase deficiency. The information from 
such studies is used to define different inheritance patterns 
5 and to diagnose future patients displaying phosphorylase B 
kinase deficiency-like symptoms. In like manner, this same 
assay can be used to monitor progress of the patient as 
his/her physiological situation moves toward the normal 
range during therapy for the condition. 

VI Kinases Kit 

The kinases of the subject invention are used to produce 
a kinases kit for diagnosing disorders or diseases associated 
with altered kinase expression. This involves the designing 
a plurality of oligomers, one set of which is specific for each 
kinase or kinase regulatory sequence. Specificity in this case 

^5 refers to sequence similarity, to the length of the nucleic acid 
molecule amplified, to cell or tissue type being screened or 
to the disorder or disease. These oligomers are combined 
with a biological sample obtained from a patient in a 
solution sufficient for PCR and amplified. The PCR products 

20 are examined first, to detect the expression of each kinase, 
and second to quantify the expression of each kinase. Kinase 
expression is compared with standard ranges for normal and 
abnormal expression. In the case(s) where kinase expression 
is altered, use of the kit has provided the physician with a 

25 named disorder or disease which can be treated or further 
investigated. 

A further use of the oligomers from the kinases kit is in 
a diagnostic assay of example V (above) used to monitor 
patient response to drug therapy. Once the disease has been 

30 named and a therapy chosen, the oligomers specific to the 
patient's disease may be used periodically to monitor the 
efiicacy of the chosen therapy. In this case, the specific 
oligomers are combined with a biological sample from the 
patient in a solution sufficient for PCR and amplified. The 

35 PCR product is quantified and compared with a normal 
standard and with the pretreatment profile of the patient. If 
the kinase expression is lending toward normal, the therapy 
may be considered effective; if the expression is even more 
abnormal, therapy should be discontinued and an alternative 

40 treatment instituted. 

VII Sense or Antisense Molecules 

Knowledge of the correct cDNA sequence of any particu- 
lar kinase, its regulatory elements or parts thereof will 
enable its use as a tool in sense (Youssoufian H. and H. F. 

45 Lodish 1993) Mol Cell Biol 13:98-104) or antisense 
(Eguchi et al (1991) Annu Rev Biochem 60:631-652) tech- 
nologies for the investigation of gene function. 
Oligonucleotides, from genomic or cDNAs, comprising 
either the sense or the antisense strand of the cDNA 

50 sequence can be used in vitro or in vivo to inhibit expression. 
Such technology is now well known in the art, and oligo- 
nucleotides or other fragments can be designed from various 
locations along the sequences. 
The gene of interest can be turned off in the short term by 

55 transfecting a cell or tissue with expression vectors which 
will flood the cell with sense or antisense sequences until all 
copies of the vector are disabled by endogenous nucleases. 
Stable transfection of appropriate germ line cells or prefer- 
ably a zygote with a vector containing the fragment will 

60 produce a transgenic organism (U.S. Pat. No. 4,736,866, 12 
Apr. 1988), which produces enough copies of the sense or 
antisense sequence to significantly compromise or entirely 
eliminate normal activity of the particular kinase gene. 
Frequently, the function of the gene can be ascertained by 

65 observing behaviors such as lethality, loss of a physiological 
pathway, changes in morphology, etc. at the intracellular, 
cellular, tissue or organism al level. 
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Id additioD to using fragments constructed to interrupt of interest. They include MMTV, SV40, and metallothio nine 

transcription of the open reading frame, modifications of promoters for CHO cells; trp, lac, tac and T7 promoters for 

gene expression can be obtained by designing antisense bacterial hosts; and alpha factor, alcohol oxidase and PGH 

sequences to promoters, enhancers, introns, or even to promoters for yeast. In addition, transcription enhancers, 

trans-acting regulatory genes. Similarly, inhibition can be 5 such as the rous sarcoma virus (RSV) enhancer, may be used 

achieved using Hogeboom base-pairing methodology, also in mammalian host cells. Once homogeneous cxiltures of 

known as "triple helix" base pairing. recombinant cells are obtained through standard culture 

VIII Expression of Kinases methods, large quantities of recombinantly produced peptide 

Expression of the kinases may be accomplished by sub- can be recovered from the conditioned medium and ana- 
cloning the cDNAs into appropriate vectors and transfecting lO lyzed using methods known in the art. 
the vectors into host cells. In some cases, the cloning vector IX Isolation of Recombinant KIN 

previously used for the generation of the tissue Ubrary also KIN may be expressed as a recombinant protein with one 

provides for direct expression of kinase sequences in E, coli or more additional polypeptide domains added to facilitate 

Upstream of the cloning site, this vector contains a promoter protein purification. Such purification facilitating domains 

for P-galactosidase, followed by sequence containing the 15 include, but are not limited to, metal chelating peptides such 

amino -terminal Met and the subsequent 7 residues of as histidine -tryptophan modules that allow purification on 

p-galactosidase. Immediately following these eight residues immobilized metals, protein A domains that allow purifica- 

is a bacteriophage promoter useful for transcription and a tion on immobilized immunoglobulin, and the domain uti- 

linker containing a number of unique restriction sites. lized in the FLAGS extension/affinity purification system 

Induction of an isolated, transfected bacterial strain with 20 (Immunex Corp, Seattle, Wash.). The inclusion of a cleav- 

IPTG using standard methods will produce a fusion protein able linker sequence such as Factor XA or enterokinase 

correspondiing to the first seven residues of P-galactosidase, (Invilrogen) between the purification domain and the kin 

about 5 to 15 residues which correspond to linker, and the sequence may be useful to facilitate expression of KIN. 

peptide encoded within the kinase cDNA. Since cDNA X Testing for Kinase Activity 

clone inserts are generated by an essentially random process, 25 The sequences in this application represent many different 

there is one chance in three that the included cDNA will lie domains of different kinase families. These domains (and 

in the correct fi"ame for proper translation. If the cDNA is not subdomains as detailed in the background of the invention) 

in the proper reading fi^ame, it can be obtained by deletion may be utilized: 1) individually for the production of 

or insertion of the appropriate number of bases by well antibodies, 2) in functional groups (eg. to span a membrane), 

known methods including in vitro mutagenesis, digestion 30 and 3) as interchangable, usable parts of a chimeric kinase, 

with exonuclease III or mung bean nuclease, or oligonucle- The various partial cDNA sequences of this appfication 

olide linker inclusion. represent the different kinase domains of the various fami- 

llie kinase cDNA can be shuttled into other vectors lies (Hardie G. and Hanks S., supra), and they may be 

known to be useful for expression of protein in specific recombined in numerous ways to produce chimeric nucleic 

hosts. Oligonucleotide linkers containing cloning sites as 35 acid molecules. For example, a known, full length kinase 

well as a stretch of DNA suflScient to hybridize to the end of such as the human map kinase of this application (Seq ID No 

the target cDNA (25 bases) can be synthesized chemically 45) may be used to swap related portions of the nucleic acid 

by standard methods. These primers can then used to sequence, analogous to domains or subdomains of MAP 

amplify the desired gene fragments by PGR. The resulting kinase polypeptides. The chimeric nucleotides, so produced, 

fragments can be digested with appropriate restriction 40 may be introduced into prokaryotic host cells (as reviewed 

enzymes under standard conditions and isolated by gel in Strosberg A. D. and ManiUo S. (1992) Trends Pharma Sci 

electrophoresis. Alternatively, similar gene fragments can be 13:95-98) or eukaryotic host cells. These host cells are then 

produced by digestion of the cDNA with appropriate restric- employed in procedures to determine what molecules acti- 

tion enzymes and filling in the missing gene sequence with vate the kinase or what molecules are activated by a kinase, 

chemically synthesized oligonucleotides. Partial nucleotide 45 Such activating or activated molecules may be of 

sequence from more than one gene can be ligated together extracellular, intracellular, biologic or chemical origin, 

and cloned in appropriate vectors to optimize expression. An example of a test system, in this case for protein 

Suitable expression hosts for such chimeric molecules tyrosine kinases, can be based on the interaction of protein 

include but arc not limited to mammalian cells such as tyrosine kinases with chemokine receptors (Taniguchi T 

Chinese Hamster Ovary (CHO) and human 293 cells, insect 50 (1995) Science 268:251-255). These receptors are capable 

cells such as Sf9 cells, yeast cells such as Saccharomyces of activating a variety of nonreceptor protein tyrosine 

cerevis/ae, and bacteria such as £. co//. For each of these cell kinases when stimulated by an extracellular chemokine. 

systems, a useful expression vector may also include an C-X-C chemokines such as platelet factor 4, interieukin-8, 

origin of replication to allow propagation in bacteria and a connective tissue activating protein III, neutrophil activating 

selectable marker such as the p-lactamase antibiotic resis- 55 peptide 2, are soluble activators of neutrophils, 

tance gene to allow selection in bacteria. In addition, the A standard measure of neutrophil activation involves 

vectors may include a second selectable marker such as the measuring the mobilization of Ca"*^ as part of the signal 

neomycin phosphotransferase gene to allow selection in transduction pathway. The experiment involves several 

transfected eukaryotic host ceUs. Vectors for use in eukary- steps. First, blood cells obtained from venipuncture are 

otic expression hosts may require RNA processing elements 60 fractionated by centrifugation on density gradients. Enriched 

such as 3' polyadenylation sequences if such are not part of populations of neutrophils are further fractionated on col- 

the cDNA of interest, umns by negative selection using antibodies specific for 

Additionally, some of the kinase vectors may contain other blood cells types. Next, neutrophils are transformed 

native promoters which will allow induction of gene expres- with an expression vector containing the kinase nucleic acid 

sion in human cells such as the 293 line mentioned above. 65 sequence of interest and preloaded fluorescent probe whose 

Other available promoters are host specific and may be emission characteristics have been altered by Ca** binding, 

specifically combined with the coding region of the kinase Or in the alternative, the neutrophil is preloaded with the 
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purified kinase of interest and fluorescent probe. Then, when 
the cells are exposed to an appropriate chemokiDe, the 
chemokine receptor activates the kinase which, in turn, 
initiates Ca"^* flux. Ca"*^ mobilization is observed and mea- 
sured using fiuorometry as has been described in Grynk- 
ievicz G. et al (1985) J Biol Chem 260:3440, and McColl S. 
et al (1993) J Immunol 150:4550-4555. incorporated herein 
by reference. 

XI Identification of or Production of Kinase Specific Anti- 
bodies 

Purified KIN is used to screen a pre-existing antibody 
library or to raise antibodies, using either polyclonal or 
monoclonal methodology. For polyclonal antibody 
production, denatured peptide from the reverse phase HPLC 
separation is obtained in quantities up to 75 mg. This 
denatured protein can be used to immunize mice or rabbits 
using standard protocols; about 100 micrograms are 
adequate for immunization of a mouse, while up to 1 mg 
might be used to immunize a rabbit. In identifying mouse 
hybridomas, the denatured protein can be labelled and used 
to screen potential murine B-cell hybridomas for those 
which produce antibody. This procedure requires only small 
quantities of protein, such that 20 mg would be sufficient for 
labelling and screening of several thousand clones. 

For monoclonal antibody production, the amino acid 
sequence, as deduced from translation of the cDNA, is 
analyzed to determine regions of high immtmogenicity. 
Peptides comprising appropriate hydrophilic regions are 
expressed irom recombinant cDNAor synthesized and used 
in suitable immunization protocols to raise antibodies. 
Selection of appropriate epitopes is described by Ausubel F. 
M. et al (supra). The optimal amino acid sequences for 
immunization are usually located at the C-terminus or 
N-terminus and in intervening, hydrophilic regions of the 
polypeptide which are likely to be exposed to the external 
environment when the protein is in its natural conformation. 

Typically, selected oligopeptides, about 15 residues in 
length, arc synthesized using an Applied Biosystems Peptide 
Synthesizer Model 431 A using fimoc-chemistry and coupled 
to keyhole limpet hemocyanin (KLH, Sigma) by reaction 
with M-maleimidobeozoyl-N-hydroxysuccinimide ester 
(MBS; Ausubel F. M. et al, supra). If necessary, a cysteine 
may be introduced at the N-terminus of the peptide to permit 
coupling to KLH. Rabbits are immunized with the peptide- 
KLH complex in complete Freund's adjuvant. The resulting 
antisera are tested for antipeptide activity by binding the 
peptide to plastic, blocking with 1% bovine serum albumin, 
reacting with antisera, washing and reacting with labelled, 
aflSnity purified, specific goat anti-rabbit IgG. 

Hybridomas may also be prepared and screened using 
standard techniques. Hybridomas of interest are detected by 
screening with labelled KIN to identify those fusions pro- 
ducing the monoclonal antibody with the desired specificity. 
In a typical protocol, wells of plates (FAST; Becton- 
Dickinson, Palo Alto, Calif.) are coated during incubation 
with affinity purified, specific rabbit anti-mouse (or suitable 
anti-species Ig) antibodies at 10 mg/ml. The coated wells are 
blocked with 1% BSA, washed and incubated with super- 
natants from hybridomas. After washing the wells are incu- 
bated with labelled KIN at 1 mg/ml. Supernatants with 
specific antibodies bind more labelled KIN than is detectable 
in the background. Then clones producing specific antibod- 
ies are expanded and subjected to two cycles of cloning at 
limiting dilution. Cloned hybridomas are injected into 
pristane -treated mice to produce ascites, and monoclonal 
antibody is purified from mouse ascitic fluid by affinity 
chromatography on Protein A. Monoclonal antibodies with 



affinities of at least 10®/M, preferably 10^ to 10^° or stronger, 
will typically be made by standard procedures as described 
in Harlow and Lane (1988) Antibodies: A Laboratory 
Manual, Cold Spring Harbor Laboratory, Cold Spring 

5 Harbor, N.Y.; and in Coding (1986) Monoclonal Antibodies: 
Principles and Practice, Academic Press, New York City, 
both incorporated herein by reference. 
Xn Diagnostic Assays Using KIN Specific Antibodies 
Particular KIN antibodies are useftil for investigation of 

10 various disorders or diseases which may be characterized by 
differences in the amount or distribution of KIN. Given the 
usual role of the kinases, KIN might be expected to be 
upregulated (or downregulated) in its involvement in acti- 
vation of signal cascades. 

15 Diagnostic assays for KIN include methods utilizing the 
antibody and a reporter molecule to detect KIN in human 
body fluids, membranes, cells, tissues or extracts thereof. 
The antibodies of the present invention may be used with or 
without modification. Frequently, the antibodies will be 

20 labelled by joining them, either covalently or noncovalently, 
with a substance which provides for a detectable signal. A 
wide variety of reporter molecules and conjugation tech- 
niques are known and have been reported extensively in 
both the scientific and patent literature. Suitable reporter 

25 molecules or labels include those radionuclides, enzymes, 
fluorescent, chemi-luminescent, or chromogenic agents pre- 
viously mentioned as well as substrates, cefaclors, 
inhibitors, magnetic particles and the like. Patents teaching 
the use of such labels include U.S. Pat, Nos. 3,817,837; 

30 3,850,752; 3,939,350; 3,996^45; 4,277,437; 4,275,149; and 
4366,241. Also, recombinant immuno-globulins may be 
produced as shown in U.S. Pat. No. 4,816,567, incorporated 
herein by reference. 
A variety of protocols for measuring soluble or 

35 membrane -bound KIN, using either polyclonal or mono- 
clonal antibodies specific for the protein, are known in the 
art. Examples include enzyme-linked immunosorbent assay 
(ELISA), radioimmunoassay (RIA) and fluorescent acti- 
vated cell sorting (FACS). A two -site monoclonal-based 

40 immimoassay utilizing monoclonal antibodies reactive to 
two non-interfering epitopes on KIN is preferred, but a 
competitive binding assay may be employed. These assays 
are described, among other places, in Maddox, D. E. et al 
(1983, J Exp Med 158:1211). 

45 XIII Purification of Native KIN Using Antibodies 

Native or recombinant protein kinases can be purified by 
immunoaffinity chromatography using antibodies specific 
for that particular KIN. In general, an immunoaffinity col- 
umn is constructed by covalently coupling the anti-KIN 

50 antibody to an activated chromatographic resin. 

Polyclonal immunoglobulins are prepared from immune 
sera either by precipitation with ammonium sulfate or by 
purification on immobilized Protein A (Pharmacia Biotech). 
Likewise, monoclonal antibodies are prepared from mouse 

55 ascites fluid by ammonium sulfate precipitation or chroma- 
tography on immobilized Protein A. Partially purified immu- 
noglobulin is covalently attached to a chromatographic resin 
such as CnBr-aclivated Sepharose (Pharmacia Biotech). The 
antibody is coupled to the resin, the resin is blocked, and the 

60 derivative resin is washed according to the manufacturer's 
instructions. 

Such immunoaffinity coliunns may be utilized in the 
purification of KIN by preparing a fraction from cells 
containing KIN in a soluble form. This preparation may be 
65 derived by solubilization of whole ceUs or of a subcellular 
fraction obtained via differential centrifugation (with or 
without addition of detergent) or by other methods well 
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known in the art. Alternatively, soluble KIN containing a 
signal sequence may be secreted in useful quantity into the 
medium in which the cells are grown. 

A soluble KIN-containing preparation is passed over the 
immunoaffinity column, and the column is washed under 
conditions that allow the preferential absorb ance of KIN (eg, 
high ionic strength buffers in the presence of detergent). 
Then, the column is eluted under conditions that disrupt 
antibody/KIN binding (eg, a buffer of pH 2-3 or a high 
concenu-ation of a chaotrope such as urea or thiocyanate 
ion), and KIN is collected. 

XIV Drug Screening 

This invention is particularly use&jl for screening thera- 
peutic compounds by using binding fragments of KIN in any 
of a variety of drug screening techniques. The molecules to 
be screened may be of extracellular, intracellular, biologic or 
chemical origin. The peptide fragment employed in such a 
test may either be free in solution, affixed to a solid support, 
borne on a cell surface or located intracellularly. One may 
measure, for example, the formation of complexes between 
KIN and the agent being tested. Alternatively, one can 
examine the diminution in complex formation between KIN 
and a receptor caused by the agent being tested. 

Methods of screening for drugs or any other agents which 
can affect signal transduction comprise contacting such an 
agent with KIN fragment and assaying for the presence of a 
complex between the agent and the KIN fragment. In such 
assays, the KIN fragment is typically labelled. After suitable 
incubation, free KIN fragment is separated from that present 
in bound form, and the amount of free or uncomplexed label 
is a measure of the ability of the particular agent to bind to 
KIN. 

Another technique for drug screening provides high 
throughput screening for compounds having suitable bind- 
ing aflSnity to the KIN polypeptides and is described in detail 
in European Patent Application 84/03564, published on Sep. 
13, 1984, incorporated herein by reference. Briefly stated, 
large numbers of different small peptide test compounds are 
synthesized on a solid substrate, such as plastic pins or some 
other surface. The peptide test compounds are reacted with 
KIN fragment and washed. Bound KIN fragment is then 
detected by methods well known in the art. Purified KIN can 
also be coated directly onto plates for use in the aforemen- 
tioned drug screening techniques. In addition, non- 
neutralizing antibodies can be used to capture the peptide 
and immobilize it on the solid support. 

This invention also contemplates the use of competitive 
drug screening assays in which neutralizing antibodies 
capable of binding KIN specifically compete with a test 
compound for binding to KIN fragments. In this manner, the 
antibodies can be used to detect the presence of any peptide 
which shares one or more antigenic determinants with KIN. 

XV Identification of Molecules Which Interact with KIN 
The inventive purified KIN is a research tool for 

identification, characterization and purification of 
interacting, signal transduction pathway proteins. Appropri- 
ate labels are incorporated into KIN by various methods 
known in the art and KIN is used to capture soluble or 
interact with membrane-bound molecules. A preferred 
method involves labeling the primary amino groups in KIN 
with ^^I Bolton- Hunter reagent (Bolton, A. E. and Hunter, 
W. M. (1973) Biochem J 133:529). This reagent has been 
used to label various molecules without concomitant loss of 
biological activity (Hebert C. A. et al (1991) J Biol Chem 
266:18989-94; McColl S. el al (1993) J Immunol 
150:4550-4555). Membrane-bound molecules are incubated 
with the labelled KIN molecules, washed to removed 
unbound molecules, and the KIN complex is quantified. 
Data obtained using different concentrations of KIN are used 
to calculate values for the number, affinity, and association 
of KIN with the signal transduction complex. 
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Labelled KIN fragments are also useful as a reagent for 
the purification of molecules with which KIN interacts, 
specifically including inhibitors. In one embodiment of 
affinity purification, KIN is covalently coupled to a chro- 
matography column. Cells and their membranes are 
extracted, KIN is removed and various KIN-free subcom- 
ponents are passed over the column. Molecules bind to the 
column by virtue of their KIN affinity. The KIN-complex is 
recovered from the column, dissociated and the recovered 
molecule is subjected to N-terminal protein sequencing. 
This amino acid sequence is then used to identify the 
captured molecule or to design degenerate oligomers for 
cloning its gene from an appropriate cDNA library. 

in an alternate method, monoclonal antibodies raised 
against KIN fragments are screened to identify those which 
inhibit the binding of labelled KIN. These monoclonal 
antibodies are then used in affinity purification or expression 
cloning of associated molecules. Other soluble binding 
molecules are identified in a similar manner. Labelled KIN 
is incubated with extracts or other appropriate materials 
derived from rheumatoid synovium. After incubation, KIN 
complexes (which are larger than the lone KIN fragment) are 
identified by a sizing technique such as size exclusion 
chromatography or density gradient centrifugation and are 
purified by methods known in the art. The soluble binding 
protein(s) are subjected to N-terminal sequencing to obtain 
information sufficient for database identification, if the 
soluble protein is known, or for cloning, if the soluble 
protein is unknown. 

XVI Use and Administration of Antibodies or Other Inhibi- 
tory Molecules 

Antibodies, inhibitors, receptors or antagonists of KIN 
fragments (or other treatments to limit signal transduction, 
TSl^, can provide different effects when administered thera- 
peutically. TSTs will be formulated in a nontoxic, inert, 
pharmaceutically acceptable aqueous carrier medium pref- 
erably at a pH of about 5 to 8, more preferably 6 to 8, 
although the pH may vary according to the characteristics of 
the antibody, inhibitor, or antagonist being formulated and 
the condition to be treated. Characteristics of TSTs include 
solubility of the molecule, half-life and antigenicity/ 
immunogenicity; these and other characteristics may aid in 
defining an effective carrier. Native human proteins are 
preferred as TSTs, but organic or synthetic molecules resuh- 
ing from drug screens may be equally effective in particular 
situations. 

TSTs may be delivered by known routes of administration 
including but not Umited to topical creams and gels; trans- 
mucosal spray and aerosol; transdermal patch and bandage; 
injectable, intravenous and lavage formulations; and orally 
administered liquids and pills particularly formulated to 
resist stomach acid and enzymes. The particular 
formulation, exact dosage, and route of administration will 
be determined by the attending physician and will vary 
according to each specific situation. 

Such determinations are made by considering multiple 
variables such as the condition to be treated, the TST to be 
administered, and the pharmacokinetic profile of the par- 
ticular TST, Additional factors which may be taken into 
account include disease state (e.g. severity) of the patient, 
age, weight, gender, diet, time and frequency of 
administration, drug combination, reaction sensitivities, and 
tolerance/response to therapy. Long acting TST formula- 
tions might be administered every 3 to 4 days, every week, 
or once every two weeks depending on half-life and clear- 
ance rate of the particular TST. 

Normal dosage amounts may vary from 0.1 to 100,000 
micrograms, up to a total dose of about 1 g, depending upon 
the route of administration. Guidance as to particular dos- 
ages and methods of delivery is provided in the literamre. 
See U.S. Pat. No. 4,657,760; 5,206344; or 5,225,212. Those 
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skilled in the art will employ different formulations for 
different l^Is. Administration to cells such as nerve cells 
necessitates delivery in a manner different from that to other 
cells such as vascular endothelial cells. 

It is contemplated that disorders or diseases which trigger 
defensive signal transduction may precipitate damage that is 
treatable with TSTs. These disorders or diseases may be 
specifically diagnosed by the tests discussed above, and such 
testing should be performed in cases where physiologic or 
pathologic problems are suspected to be associated with 
abnormal signal transduction. 

All publications and patents mentioned in the above 
specification are herein incorporated by reference. Various 
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modifications and variations of the described method and 
system of the invention will be apparent to those skilled in 
the art without departing from the scope and spirit of the 
invention. Although the invention has been described in 
connection with specific preferred embodiments, it should 
be understood that the invention as claimed should not be 
unduly limited to such specific embodiments. Indeed, vari- 
ous modifications of the above-described modes for carrying 
out the invention which are obvious to those skilled in the 
field of molecular biology or related fields are intended to be 
within the scope of the following claims. 



TABLE 1 



Clone 


library 


GenBank/SwissProt Identifier, Name 


297 


U937 


P00540 Mouse protooncogene ser/thr kinase 


1622 


U937 


HUMCLK3B dk3 gene product 


10007 


THP-l Phorbol LPS 


HSPLKl protein kinase 


12702 


THP-1 Phorbol LPS 


RATSCPK ser/thr kinase 


23789 


Inflamed Adenoid 


CHKFRNK chicken tyr kinase 


35652 


HUVEC 


KEK5 Chicken Y kinase receptor 


35855 


HUVEC 


HUMANBTK37 tyr kinase 


40194 


T + B Lymphobtast 


KRBl VARV \^riola virus protein kinase 


42170 


T + B Lymphoblast 


rl5>U095o4 serme kinase 


46081 


Corneal Stroma 


YSCKINl yeast protein kinase 


46651 


Corneal Stroma 


CDK4, PI 1802 


53840 


Fibroblast 


HSDAPK^ Death-associated protein kinase 


54065 


Fibroblast 


SCPROKIN 1 yeast 35.6 kD 


56494 


Fibroblast 


KLMC RAT, myosin light chain kinase 


58029 


Skeletal Muscle 


ATHCTRIA 1 A, Thaliana V kinase receptor 


64663 


Placenta 


KIN3 Yeast protein kinase P22209 


67967 


HUVEC Sheer Stress 


YAKl Yeast protein kinase 


68963 


HUVEC Sheer Stress 


KATK Human Y kinase 


71904 


Placenta 


KIN3 P22209SWP 


75289 


THP-1 Phorbol 


H5U08023 Avian retrovirus ipl30 


81865 


Rheumatoid Synovium 


SNFl Yeast C catabolite derepressing 


82056 


HUVEC Sheer Stress 


P34314 C. elegans ser/thr kinase 


108485 


AML Blast 


KAPA Pig cAMP-dcpendenl protein kinase 


114973 


Testis 


CC2B ARATH Mouse-car cress cdc 


118591 


Skeletal Muscle 


PB0192 mixed lineage kinase 1 


119819 


Skeletal Muscle 


H5U09564 ser kinase 


120376 


Skeletal Muscle 


U01064 Y kinase 


132750 


Bone Marrow 


M1JC2 mixed lineage kinase 2 


140052 


T Lymphocyte 


G-protein coupled receptor kinase 


146392 


T Lymphocyte 


SCYAKl Yeast Yakl kinase 


156108 


THP-1 Phorbol LPS 


U01064 Dictyostelium Y kinase 


173627 


Bone Marrow 


MMU14166 Kiz 


181971 


Placenta 


HUMTICR Y kinase receptor 


182538 


Placenta 


HSNrEK2R kinase 


184416* 


Cardiac Muscle 


KPKS Human proto-oncogene Scr/Thr kinase 


191283 


Rheumatoid Sjnovium 


RATSGPK Ser/Thr kinase 


192268 


Rheumatoid Synovium 


ATHAPKIA Ser/Thr kinase 


214915 


Stomach 


XLMPK2K Map kinase 


223163 


Pancreas 


TGF-p receptor ser/thr kinase 


237002 


Small Intestine 


PI 6227 Mouse Y kinase blk 


239990 


Hippocampus 


SHC Human transforming protein 


240142 


Hippocampus 


HSNEK2R 


275781 


Testes 


BOVCKIA casein kinase 


285465 


Eosinophils 


DDIMLCK myosin light chain kinase 



SEQUENCE USTING 



( 1 ) GENERAL INFORMAnON: 

( i i i ) NUNffiER OF SEQUENCES: 45 

( 2 ) INFORMAnON FOR SEQ ID Nai: 

{ i ) SEQUENCE CHARACTERiynCS: 
( A ) LENGTH: 526 base paira 
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( B ) TYPE: nucleic acid 

( C ) STRANDEDNESS: single 

( D ) TOPOLOGY: linear 

C i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: U937 
( B ) CLONE: 297 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

ACAAGGGTTG TAATTAAAGG CGATTTTGAA ACAATTAAAA TCTGTGATGT AGGAGTCTCT 60 

CTACCACTGG ATGAAAATAT GACTGTGACT GACCCTGAGG CTTOTTACAT TGGCACAGAG 120 

CCATGCAAAC CCAAAGAAGC TGTGGAGGAG AATGGTGTTA TTACTGCAAG GCAGACATAT 180 

TTGCCTTTGG CTTACTTTGT GGGAAATGAT GACTTTATCG ATTCCACACA TTAATCTTTC 240 

AAATGATGAT GATGATGAAG TAAAAACTTT TTGATGAAAA GTAATTTTGA TGTTGAAGCA 300 

TTACTATGCA AGCCCTTTGG ACCTAAGGCC ACCCTATTTT AATATTGGAG GACCTTGGTG 360 

AATCATACCC AGGAAGGTAA TTTGACCTCT TCTCTGATCA CCCTTATTGA AGCCCCCAAG 420 

CACCCTTCTT GTGACAATTT TAGGTTGGAC CAGTTGCTTT GGGCCAACTT AACTAAAGTT 480 

GTTCGAAAAA CTTTTTTCCA AAAATTTCCA TAGGCCTCCC AAGTTT 526 

( 2 ) INFORMAnON FOR SEQ ID N0:2: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 378 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: smglc 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

C V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: U937 
( B ) CLONE 1622 

C X i ) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

AGAACACCAC ATCCGAGTGC CTGACTTTGG CAGTCCCACA TTTGACCATG AGCACCACAC 60 

CACCATTGTG GCCACCCGTC ACTATCGCCG CCTGAGGTGA TCCTTGAGCT GGGCTGGGCA 120 

CAGCCTGGTG ACGTCTGGGC ATTGGCTGCA TTCTCTTTGA GTACTACCGG GGCTTCACAC 180 

TCTTCCAGAC CCACGAAAAC CGAGAGCACC TGGTGATGAT GGAGAAGATC CTAGGGCCCA 240 

TCCCATCACA CATGATCCAC CGTACCAGGA AGCAGAATAT TTCTACAAAG GGGGCCTAGT 300 

TTGCCATGGA CAGCTCTTAC GGCCGGTATG TAAGGGACTC AAACCTTTAA GGTTCATGTT 360 

CAACCTTCCT GGGAAGTG 378 

( 2 ) INFORMATION FOR SEQ ID N0:3: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 326 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE; cDNA 

( V i i ) IMMEDIATE SOURCE- 

( A ) UBRARY: THP-1 Phorbol LPS 
( B ) CLONE: 10007 

( X I ) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

GGGCTGGCAG CCCGCTTGGA CCCTCCGGAG CAGAGGAAGA AGACCATCTT GGCACCCCCA 60 
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ACTATGTGGC TCCAGAACTG CTGCTGAGAC AGGGCCACGG CCCTGAGGCG GATGTATGGT 120 

CACTGGGCTG TGTCATGTAC ACCCTGCTCT GCGGGACCCT CCCTTTGAGA CGGCTGACCT 180 

GAAGGAGACG TACCGCTGCA TCAAGAAGGT TCACTACAAC GGTGCCTGCC AGCTCTTAAT 240 

GCCTGCCCGA GTCCTTGGCC GCAATCCTTC GCCCCTTAAC CCGAGAACCG GCCCTCTATT 300 

GACAGATCCT TGCGGCAATT AACTTT 326 

( 2 ) INFORMAnON FOR SEQ [D N0:4: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 257 base pairs 
( B ) TYPE: nudeic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: THP-1 Phoibol LPS 
( B ) CLONE: 12702 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:4: . 

CCGCAACACA CCTCCTGGAG CGCCTCCTGA GAACGACAGG CAAAGGGCTG GGCCAAGGAT 60 

GACTTCATGG AGATTAAGAG TCATGTTTCT TCTCCTTAAT TAACTGGGAT GATCTCATTA 120 

ATAAGAAGAT TACTCCCCCT TTTACCCAAA TGTGAGTGGG CCCAACGCCT ACGGACTTTG 180 

CCCCGAGTTT ACGAAGACCC TTCCCCAATC CATTGGAAGT CCCCTGAAAG GTCCTATACA 240 

AGTCAGTTAA GGAAGTT 257 

( 2 ) INFORMAnON FOR SEQ ID N0:5: 

C i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 252 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE; cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Inflamed Adenoid 
( B ) CLONE: 23789 

( jt i ) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

GTGAAGAATG TGGGGCTGAC CCTCGGAAGT CATCGGGAGC GTGGATGATC TCCTGCCTTC 60 

CTTGCCGTCA TCTCACGGAC AGAGATCGAG GGCACCCAGA AACTGCTCAA CAAAGACCTG 120 

GCAGAGCTCA TCAACAAGAT GCGCTGGCGC AAGAACGCGT GACCTCCCTG TAGGAGTAAG 180 

AGGCAGATCT GACGGTTCAC AACCCTGGCT GTGACGCAAG AACCTCTTAC GTGTGCCACG 240 

CCCAAACTTC TG 252 

( 2 ) INFORMATION FOR SEQ ID N0:6: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 255 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANT>EDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TTPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Hnvec 
( B ) CLONE: 35652 



C X i ) SEQUENCE DESCRIPTION: SEQ ID N0:6: 
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CAAAATCGTG GCCCGGAGAA TGCCGCGGCC TCAACCCTCT CCTGGACCAG CGGCAGCTCA 60 

CTACTCAGCT TTTGGCCTGT GGGCGAGTGG CTTCGGGCCA TCAAAATGGG AAGATACGAA 120 

GAAACTTTCG CAGCCGCTGC CTTTGGCTCC TTCAGCTGGT CAGCCAGATC TCTGCTGACG 180 

ACCTGCTCCG AATCGAGTCA CTCTGGCGGG ACACCAGAAG AAAATTTGGC CAGTTCCAGC 240 

ACATGAGTCC CAGGT 255 

( 2 ) INFORMAnON FOR SEQ U> N0:7: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 238 base paini 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) IXJPOLOGY: linear 

( i i ) MOLECULE TVPE: cDNA 

C V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Huvcc 
( B ) CLONE; 35855 

C X i ) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

GAATACCCCA TATACATAGT GACTGATATA TAAGCAATGG CTGCTTGCTG AATACCTGAG 60 

GAGTCACGGA AAAGGCTTAA CCTTCCCAGT CTTAGAAATG TGCTACGATG TCTGTAAGGC 120 

ATGGCCTTCT TGGAGAGTCA CCAATTCATA CACCGGGCTT GGCTGCTCGT AACTGCTTGG 180 

TGGACAGAGA TCTCTGTGTG AAAGTTCTCC ATTTGGATGA CAAGGTATGT TCTTGATG 238 

( 2 ) INFORMAnON FOR SEQ ID N0:8: 

( i ) SEQUENCE CHARACTCRISTICS: 
( A ) LENGTH: 261 base pairs 
( B ) TYPE' nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

C i i ) MOLECULE TYPE: cDNA 

{ V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: T+B Lympboblast 
( B ) CLONE: 40194 

( X i ) SEQUENCE DESCRIPTION: SEQ CD NO:8: 

AAACAACTTG ATTATTTAGG AATTCCTCTG TTTTATGGAT CTGGTCTGAC TGAATTCAAG 60 

GGAAGAAGTT ACAGATTTAT GGTAATGGAA AGACTAGGAA TAGATTTACA GAAGATCTCA 120 

GGCCAGAATG GTACCTTTAA AAAGTCAACT GTCCTGCAAT TAGGATCCGA ATGTTGGATG ISO 

TACTGGAATA TATACATGAA AATGAATATG TTCATGGTGA TATAAAAGCA GCAAATCTAC 240 

TTTTGGGTTA CAAAAATCCT T 261 

( 2 ) INFORMAnON FOR SEQ ID N0:9: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 242 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TVTE; cDNA 

( V i i ) IMMEDIATE SOURCE 

( A ) UBRARY: T+B Lympboblast 
( B ) CLONE: 42170 

( X i ) SEQUENCE DESCRIPTION: SEQ ID N0:9: 
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TAAGAAACCT GAAGATCGAG CCACTGCTGA AGAATGTCTA AAGCACCCCT GGTTGACACA 60 

GAGCAGTATT CAAGAGCCTT CTTTCAGGAT GGAAAAGGCA CTAGAAGAAG CAAATGCCCT 120 

CCAAGAAGGT CATTCTGTGC CTGAAATTAA TTCGGATACC GACAAATCAG AAACCGAGGA 180 

ATCCATTGTA ACCGAAGAGT TAATTGTAGT TACTTCATAT ACTCTAGGGC AATGCAGACA 240 

GT 2 4 2 

( 2 ) INFORMAnON FOR SEQ [D NO:10: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 222 base pairs 
( B ) TYPE: nocleic acid 
( C ) STTRAKDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Corneal Stroma 
( B ) CLONE; 46081 

( X j ) SEQUENCE DESCRIPTION: SEQ ID NOilO: 

GCAAAGGACA GTCCGCCGAG GTGCTCGGTG GAGTCATGGC ATTCCCTTTT GGAAGACTGG 60 

CCTTGGTGCA AACCCTGGAG AAGGTGCCTA TGGAGAAGTT CAACTTGCTG TAAATAGAGT 120 

AACTAAGAAG CAGTCGCAGT GAAGATTTAG ATATAAGCGT GCCGTAGACT GTCCCGAAAA 180 

TATTAAGTAG ATCTGTATCA ATAAAATGCT AATCATGAAA TT 222 

( 2 ) INFORMAnON FOR SEQ tD NO: 11: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 225 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: smgle 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Corneal Stroma 
( B ) CLONE: 46651 

( X i ) SEQUENCE DESCRIPTION: SEQ ID N0:11: 

ATGCTCCGCC AGTGAGAAGG GCGGCTGCCT GAGCGCCTCA CCAGTCCTCA TCACCCAGAT 60 

CCTGTGGCTT TGAGACACCT TCACTTAAGA ACATTTGCCA CTTGACTTAA ACCAGAAACG 120 

TGTTTTGTGG CATCAGCAGA CCCTTTCTCA GGTAAGTTGT GCTTTGCTTT TAGCATACGT 180 

GAGAAGTTGT TCCGCTCCAT TTTGTGGGAC GTCTTTCTTT CCTTG 225 

( 2 ) INFORMAnON FOR SEQ ID NO: 12: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 256 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

C i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE; 

( A ) UBRARY: Fibroblast 
( B ) CLONE: 53840 

C X i ) SEQUENCE DESCRIFHON: SEQ ID N0:12: 

CAGCGCCTTA CATCTCGCAG CCAAGAACAG CCACCATGAA TGCATCAGGA AGCTGCTTCA 60 

TCTAAATGCC CAGCCGAAAG TTTTGACAGC TCTGGGAAAA CAGCTTTACA TTATGCAGCG 120 
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GCTCAGGOCT GCCTTCAACC TGTGCAGATT CTTGCGAACA CAAGAGCCCC ATAAACCTCA 180 
AACATTTGGA TGGGAATATA CCGCTGCTGC TTGCTGTACA AAATGGTCAC AGTGAGATCT 240 
GTCACTTTTC CTGGTC 256 

( 2 ) INFORMATION FOR SEQ ID N0:13: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENCrrH: 240 baw pairs 
( B ) TYPE: nndeic ocid 
( C ) STRAKDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Fibroblast 
( B ) CLONE: 54065 

( X i ) SEQUENCE DESCRIITION: SEQ CD N0:13: 

GTTGACATCT GGTCCCTGGG CATATGGCCA TCGAAATGAT TGAAGGGCAG CCTCATACCT 60 

CAATGAAAAC CCTTGAGAGC CTTCTACCTC ATTGCCACCA ATGGGACCCC AGAACTTCAG 120 

AACCCAGAGA AGCTGTCAGC TATCTTCCGG GACTTTCTGA ACCGCTGTCT CGAGATGGAT 180 

GTGCAGAAGA GAGGTTCAGC TAAAGAGCTG CTACAGCATC AATTCCTGAA GATTGCCAAT 240 

( 2 ) [NI'ORMATION FOR SEQ (D NO: 14: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 195 base pairs 
( B ) TYPE: Quclcic acid 
( C ) STRAKDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

C V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Fibroblast 
( B ) CLONE: 56494 

C X i ) SEQUENCE DESCRIPTION: SEQ ID NO:14: 

AACAGTGAAG AGCTCCGAGA AATTATGGGT ACCCTGATAT GTGGCTCCTG AAATTTAGTT 60 

ATGATCCTAT AAGCATGGCA ACAGATATTG GAGCATTGGA GTGTTAACAT ATGTCATGCT 120 

TACAGGAATA TCACCTTTTT AGGCAATGAT AAACAAGAAA CATTCTTAAA CATCTCACAG 180 

ATGATTTTAA GTTAT 195 

( 2 ) INFORMAnON FOR SEQ ID N0:15: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 207 base pairs 
( B ) TYPE: nnclcic acid 
( C ) STRAKDEDNESS: smgle 
( D ) TOPOLOGY: linear 

C i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: SkeleUl Muscle 
( B ) CLONE; 58029 

C X i ) SEQUENCE DESCRIPTION: SEQ ID NO:15: 

GGAGTGTTTA TCGAGCCAAA TGGATATCAC AGGACAAGGA GGTGGCTGTA AAGAAGCTCC 60 

TCAAAATAGA GAAAGAGGCA GAAATACTCA GTGTCCTCAG TCACAGAAAC ATCATCCAGT 120 

TTTATGCAGT AATTTTGAAC CTCCCAACTA TGGCATTGTC ACAGAATATG CTTCTTGGGT 180 
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CACTCTATGA TTACATTAAC ACTACAA 207 

( 2 ) INFORMAnON FOR SEQ ID NO: 16: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 184 base pahs 
( B ) TYPE: nndcic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i j ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Placenta 
( B ) CLONE: 64663 

( ji i ) SEQUENCE DESCRIPTION: SEQ ID N0:16: 

CGGGGTGG'IA AAACTTGGAG ATCTTGGGAT TGGCGGTTT'I' AGCTCAAAAA CCACAGCTGC 60 

ACATTCTTTA GTTGGTACGC CTATTCATGT TCCAGAGGAT ACAGAAATGG ATACAACTTC 120 

AAATCTCATC TGGTCTCTTG GCTGTCTACT ATATGGATGG CTGCATTACA AAGTCCTTTC 180 

T AT G 18 4 

( 2 ) INFORMAnON FOR SEQ ID NO: 17: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 206 base pain 
( B ) TYPE: nncleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: HUVEC Sheer Stress 
( B ) CLONE: 67967 

( jt i ) SEQUENCE DESCRIPTION: SEQ ID N0:l7: 

TGAATTGCTG AGCATAGACC TTTATGAGCT GATTAAAAAA AATAAGTTTC AGGTTTTAGC 60 

GTCCAGTTGG TACGCAAGTT TGCCCAGTCC ATCTTGCAAT CTTTGGTGCC CTCCACAAAA 120 

TAAGATTATT CACTGCGATC TGAGCCAGAA AACATTCTCC TGAAACACCA CGGGCGCAGT 180 

TCAACCAAGG TCATTGACTT TGGGTT 206 

( 2 ) INFORMAnON FOR SEQ ID NO: 18: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; 268 base pairs 
( B ) TYPE: nucleic ocid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: HUVEC Sheer Stress 
( B ) CLONE: 68963 

( X i ) SEQUENCE DESCRIPTION: SEQ ID N0:l8: 

GGGAAGTGGC CAGTTTGGAG TGGTCAGCTG CGCAAGTGGA AGGGGCAGTA TGATGTTGCT 60 

GTTAAGATGA TCAAGGAGGG CTCCATGTCA GAAGATGAAT TTTTCAGGAG GCCCAGACTA 120 

TATGAAACTC AGCCATCCCA AGCTGGTTAA ATTCTATGGA GTGTGTTAAA GGATTACCCC 180 

ATATACATGT GACTAATATA TAGCAATGCT TGCTTTTCTG AATTACCTGG GGAGTCACGG 240 

AAAAAGGACT TTTAACCCTT CCCGCTTG 268 
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( 2 ) INFORMAnON FOR SEO U> NO:19: 

( i ) SEQUENCE CHARACTERtSnCS: 
( A ) LENGTH; 224 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
< D ) TOPOLOGY: tineai 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Placenta 
( B ) CLONE: 71904 

( K i ) SEQUENCE DESCRIPTION: SEQ ID N0:19: 

CCTGGGGTGG TAAAACTTGG AGACTTGCCT TGGCCGGTTT TCCACCTCAA AAACCACAGC 60 

TGCACATCCT TTACTTGGTA CGCCTTATTA CATCTTCCAG AGAGATACAT GAAAATGGAT 120 

ACAACTCAAA CTGACATCTG GCCTTTGGCT GTTACTATAT GAATGGCTGC TTACAAAGCC 180 

TTCCTATGGT GACAAAATGA TTTTACTCAT TGTGTAAGAG ATAG 224 

( 2 ) INFORMAnON FOR SEQ ID NO:2D: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 195 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECUl^ TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: THP-1 Phorbol 
( B ) CLONE: 75289 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

GCGGGGAATG ACTCCCTATC CTGGGGTCCA GAACCATGAG ATGTATGATA TCTTCTCCAT 60 

GGCCACAGGT TGAAGCAGCC CGAAGACTGC CTGCTGAACT GTATGAAATA ATGTACTCTT 120 

GCTGGAGAAC CGATCCCTTA GACCGCCCCA CCTTTTCATA TTGAGGCTGC AGCTAGAAAA 180 

ACTCTTAGAA AGTTT 195 

( 2 ) INFORMAnON FOR SEQ ID N0:21: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 219 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

C i i ) MOLECULE TYPE: cDNA 

( V 1 i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Rheumatoid Synovium 
( B ) CLONE: 81865 

( X i ) SEQUENCE DESCRIPTION: SEQ ID N0:21: 

CACACGAGAA GCAGAAACAC GACGGGCGGG TAAGATCGGC CACTACATTC TGGTGACACG 60 

CTGGGGGTCG GCACCTTCGG CAAAGTGAAG GTTGGCAAAC ATGATTGACT GGCATAAAGT 120 

AGCTGTAAGA TACTCATCGA CAGAAGATTC GGAGCCTTGA TGTGGTAGGA AAAATCCCAG 180 

GAAATTCAGA ACCTCAAGCT TTTCACGCAT CCTCATATA 219 

( 2 ) INFORMAnON FOR SEQ ID NO:22: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 181 base pairs 
( B ) TYPE: nucleic acid 
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( C ) STRAMJEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: HUVEC Sheer Stress 
( B ) CLONE: 82056 

( 31 i ) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

CCACCAAAGA TCTCAAATAA AGTTGATGTG TGGTCGGTGG GTGTATCTCT ATCAGTGTCT 60 

TTATGGAAGG AAGCCTTTTG GCCATAACCA GTCTCAGCAA GACATCCTAC AAGAGAATAC 120 

GATTTTAAAG CTACTGAACT GCAGTTCCCG CCAAAGCCAG TAGTAACACC TGAAGCAAAG 180 

G 18 1 

( 2 ) INFORMAnON FOR SEQ ID NO:23: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 218 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRAKDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: AML Blast 
( B ) CLONE: 108485 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

TATGGTTATA TGGAAGAGAA TGTGACTGGT GGTCGGTTGG GGTATTTTTA TACGAAATGC 60 

TTGTAGGTGA TACACCTTTT TATGCAOATT CTTTGGTTGG AACTTACAGT AAAATTATGA 120 

ACCATAAAAA TTCACTTACC TTTCCTGATG ATAATGACAT ATCAAAAGAA GCAAAAAACC 180 

TTATTTGTGC CTTCCTTACT GACAGGGAAG TGAGGTTA 218 

( 2 ) ENFORMATION FOR SEQ ID NO:24: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 264 base pairs 
( B ) TYPE: nncleic acid 
( C ) STTRANDEDNESS: smgle 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Testis 
( B ) CLONE: U4973 

( X i ) SEQUENCE DESCRIPTION: SEQ tD NO:24: 

GACGGTGGCC ATTTGACATG TGGAGCCTGG GTGCATCACG GTGGAGTTGT ACACGGGCTA 60 

CCCCCTGTTC CCCGGGAGAA TGAGGTGGAG CAGCTGGCCT GCATCATGGA GGTGCTGGGT 120 

CTGCCGCCAG CCGGCTTCAT TCAGACAGCC TCCAGGAGAC AGACATTCTT TGATTCCAAA 180 

GGTTTTCCTA AAAATATAAC CACAACCAGG GGAAAAAAAG ATTCCAGATT CCAAGGGCCC 240 

TCACGGATTG GTGCTGAAAA AACT 264 

( 2 ) INFORMATION FOR SEQ W NO:25: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 236 base pairs 
( B ) TYPE: nucleic acid 
( C ) ^TRANDEDNESS: single 
( D ) TOPOLOGY: linear 
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( I i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Skdtal MdscIc 
( B ) CLONE: U8S91 

( X i ) SEQUENCE DESCRIFnON: SEQ \D NO:25: 

GACTGAGCAC ACTGAAACAT CATCCAGTTT TATGGAGTAA TTCTTGAACC TCCCAACTAT 60 

GGCATTGTCA CAGAATATGC TTCTCTGGGA TCACTCTATG ATTACATTAA CAGTAACAGA 120 

AGTGAGGAGA TGGATATGGT CACATTATGA CCTGGGCCAC TGATGTAGCC AAAGGAATGC 180 

ATTATTTACA TATGGGGCTC CTGTCAAGGT GATTCACAGA GACCTCAAGT CAAGGA 236 

( 2 ) INFORMATION FOR SEQ ID NO:26: 

( i ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: 200 base paii* 
( B ) TYPE: nucleic acid 
( C ) STTRANDEDNESS: smgle 
( D ) TOPOLOGY: linear 

C i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Skeltal Muscle 
( B ) CLONE: U9819 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

CCTGCATGGC CTTCGAGCTC GCCACTGGTG ACTACCTGTT CGAGCCGCAT TCTGGAGAAG 60 

ACTACACTCG TGATGAGGGT AAGGGGTGAG GGCTCTGGGC TCAGCCTCCC GGCCTCCCGG 120 

CCTGCCTGCC CCCAACCTCC TCTTTTGCCC ACAGACCACA TCGCTCACAT AGTGGAGCTT 180 

CTGGGGGACA TCCCCCCAGC 200 

( 2 ) INFORMAnON FOR SEQ ID NO:27: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 217 base pairs 
( B ) TYPE: nnclcic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TVTEicDNA 

C V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Skeletal Muscle 
( B ) CLONE: 120376 

( X i ) SEQUENCE DESCRIPTION: SEQ ED NO:27: 

GATTACAAGT AGCTTGGTTG TAGTGGAAAA AAACGAGAGA TTAACCATTC CAAGCAGTTG 60 

CCCCAGAAGT TTTGCTGAAC TTTACATCAG TTTGGGAAGC TGATGCCAAG AAACGGCCAT 120 

CATTCAAGCA AATCATTTCA ATCCTGGGTC CATGTCAAAT GACACGAGCC TTCCTGCAAC 180 

TGTAACTCAT TCCTACACAA CAAGGCGGAG TGGAGGT 217 

( 2 ) INFORMAnON FOR SEQ ID NO:28: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 156 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE; cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Bone Marrow 
( B ) CLONE U2750 
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( )( i ) SEQUENCE DESCRIPTION: SEQ ID N0:2& 

GTACATTTGA CTCTGTTGTT TTCTCTCGTA GTTCCCAAAC TCATGGAAGT CTGTTTTTAT 60 

CAATATGATG TAAAGTCTGA AATATACAGC TTTGGAATCG TCCTCTGGGA AATCGCCACT 120 

GGAGATATCC CGTTTCAAGG CTGTAATTCT CAGAAG 156 



( 2 ) [NFORMAnON FOR SEQ U) NO:29: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 224 base paiis 
( B ) TYPE: DDcleic ocid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

C V i j ) IMMEDIATE SOURCE: 

( A ) UBRARY: T L>Tnpliocytc 
( B ) CLONE: 140052 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:29: 



TGTAAATAAG GCCCTTCTCC ACTTGACTTC AGGCAGCAGA TTGTCTAGAA GCCTAAGGAC 60 

AGCAATTTCT CTGACAAGAC AAAGTAGATA TTTTATACCA GGGGTTGGCA AACTACTGCC 120 

CACGGGCCGA ATTTGGCCCA GTCTGTTTTT GTATGGTGCA AACTAAAAAT GATTTTTACA 180 

TTTTTAAACA GTTATAAAAG AAAAAAATAT GTGGTCTGTG AAAT 224 



( 2 ) INFORMAnON FOR SEQ ID NO:30: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LEN^GTH: 198 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: T Lymphocyte 
( B ) CLONE: 146392 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:30: 



TTTTCTTTGT GTTTTTTTTT GTTCCAGTTT ATTTTAAATG CATATTTTAG TTGATTGCTT 60 

TTTTAAAAAG CCCCCTCTGG CCTCCTGATT CCAGCTAGTG TCAGCAGTGG GATACCTGCG 120 

CTTGAAGGAC ATCATCCACC GTGACATCAA GGATGAGAAC ATCGTGATCG CCGAGGACTT 180 

CACAATCAAG CTGATACT 198 



( 2 ) INFORMAnON FOR SEQ ID N0:3l: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 210 base pairs 
( B ) TYPE: Docleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE; 

( A ) UBRARY: THP-1 Phorbol LPS 
( B ) CLONE: 156108 

( X i ) SEQUENCE DESCRIPTION: SEQ ID N0:31: 

TGAAAACTAT GAACCTGGAC AAAAATCAAG GGCCAGTATC AAGCACGATA TATATAGCTA 60 

TGCAGTTATC ACATGGGAAG TGTTATCCAG AAAACAGCCT TTTGAAGATG TCACCAATCC 120 
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TTTGCAGATA ATGTATAGTG TGTCACAAGG ACATCGACCT GTTATTAATG AAGAAAGTTT 180 
GCCATATGAT ATACCTCACC GAGCACGTAT 210 



( 2 ) INFORMAnON FOR SEQ ID NO:32: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENCmt 202 base pairs 
( B ) TYPE; Docleic acid 
( C ) STRAXDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE; 

( A ) UBRARY: Bone Marrow 
( B ) CLONE: 173627 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:32; 



AGAAGATCGG GGCCGGCTTC TTCTCTGAGG TCTACAAGGT TCGGCACCGA CAGTCAGGGC 60 

AAGTATGGTG CTGAAGATGA ACAAGCTCCC CAGTAACCGG GGCAACACAC TACGGGAACT 120 

GCAGCTGATG AACCGGCTCA GGCACCCCAA CATCCTAAGG TTCATGGGAG TCTGTGTGCA 180 

CCAGGCACAG CTGCACGCTC TT 202 



( 2 ) INFORMATION FOR SEQ ID NO:33: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 222 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Placenta 
( B ) CLONE: 18L971 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:33: 



CGTTTTTGGA GGGTTCACAC CTGTCCCTTT CAAATGCTGG CGCTTTCACA CACTCCTTCT 60 

CTCCTGCCAG CACCTTCTGG TCTCAGGAGC ATTGCAGGAT GTTGTGTGAG TAAGTATGGG 120 

AGACACTTTA GTATGGCTTT TTTCAGCTTA GCCTCCTGTT ATCAGACAGC AGTCTCTTTC 180 

AGTGTCAAGG TTTGAGTACT AGATGGTGGA GAAAGCCTGT TT 222 



( 2 ) INFORMAnON FOR SEQ ID NO:34: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 192 base paiia 
( B ) TYPE: nncleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

C i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Placenta 
( B ) CLONE: 182538 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:34: 



CTTGGGGTGG TAAAACTTGG AGATCTTGGG CTTGGCCGGT TTTTCAGCTC AAAAACCACA 60 

GCTGCACATT CTTTAGTTGG TACGCCTTAT TACATGTCTC CAGAGAGAAT ACATGAAAAT 120 

GGATACAACT TCAAATCTGA CATCTGGTCT CTTGGCTGTC TACTATATGA CATGGCTGCA 180 

TTACAAACTCCT 192 
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( 2 ) INFORMAnON FOR SEO ID NO:35: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LEKGTH: 152 base paira 
( B ) TYPE: nucleic acid 
( C ) STRAXDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Cardiac Muscle 
( B ) CLONE: 184416 

( ji i ) SEQUENCE DESCRIPTION: SEQ ID NO:35: 

CTATGGAAGG CCGCTGGCAG GGCAATGACA TTGTCGTGAA GGTGCTGAAG GTTCGAGACT 60 

GGAGTACAAG GAAGAGCAGG GACTTCAATG AACAGTGTCC CCGGCTCAGG ATTTTTCGCA 120 

TCCAAATGTG CTCCCAGTGC TAGGTGCCTG CC 152 

( 2 ) INFORMATION FOR SEQ ID NO:36: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 152 base paire 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

C i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: RJicumatoid Synovium 
( B ) CLONE: 191283 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

CAACTACAGT GAACCTAAAA TGCCTCTAAT ACCTTTGCAA TTATCTTTAA GAGGATATCT 60 

TATGAGTGAA ATTAACTTGT CCAACTACTT TCCTATTCAC TTTTTTACAG AGACTTAAAA 120 

CCAGAGAATA TTTCTAGATT CACAGGGACA CT 152 

( 2 ) INFORMAnON FOR SEQ ID NO:37: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 199 base pirs 
( B ) TYPE: noclcic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE; cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Rheumatoid Synovium 
( B ) CLONE: 192268 

C X i ) SEQUENCE DESCRIPTION: SEQ ID NO:37: 

AGTGGACTGC AGTAAGCAGA GCTTCCTGAC CGAGGTGGAG CAGCTGTCCA GGTTTCGTCA 60 

CCCAAACATT GTGGACTTTC TGGCTACTGT GCTCAGAACG GCTTCTACTG CCTGGTGTAC 120 

GGCTTCCTGC CCAACGGCTC CCTGGAGGAC CCTTCCACTG CCAGACCCAG GCCTGCCCAC 180 

CTCTCTCCTG CCCTCAGCG 199 

( 2 ) [NFORMAnON FOR SEQ ID NO:38: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 189 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 



( i i ) MOLECULE TYPE: cDNA 
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( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Stomach 
( B ) CLjONE: 214915 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:38: 

AGAAGATCCA GTACCTGGTG TATCAATGCT CAAAGGCCTT AAGTACATCC ACTCTCTGGG 60 

GTCGTGCACA GGGACCTGAA GCCAGGCAAC CTGGCTGTGA ATAGGACTGT AACTGAAGAT 120 

TCTGGATTTT GGGCTGGCGC GACATGCAGA CCCCGAGATG ACTGGCTACG TGGTGACCCG 180 

CTGGTACCT 189 

( 2 ) [NFORMAnON FOR SEQ ID NO:39: 

( i ) SEQUENCE CliARACrERISnCS: 
( A ) LENCni: 167 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRAKDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Pancreas 
( B ) CLOxNE: 223163 

C X i ) SEQUENCE DESCRIPTION: SEQ ID NO:39: 

CTTGCTCTTC TGACAGGATG AGAGTTATTA TAAGCAAATC CTACCTAGAG GCTTTTAACT 60 

CTAATGGGAA TAACTTGCAA CTAAAAGACC CAACTTGCAG ACCAAAATTA TCAAATGTTC 120 

TGGATTTTCT GTCCCTCTTA ATGGATGTGG TACAATCAGA AAGGTAG 167 

( 2 ) INFORMAnON FOR SEQ ID NO:40: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 197 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
C D ) TOPOLOGY lineal 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Small Intestine 
( B ) CLONE 237002 

( X i ) SEQUENCE DESCRIPTION: SEQ ID N0:40: 

CCCAAACCTG CCCAGCCAGC CCTGAAAATG CAAGTTTTGT ACGATTTTGA AGCTAGGAAC 60 

CCACGGGAAC TGACTGTGGT CCAGGGAGAG AAGCTGGAGG TTTGCACCAC AGCAAGCGGT 120 

GGTGGCTGGT GAAGAATAGG CGGGACGGAG CGGCTACATT CCAAGCAACA TCTGGGCCCC 180 

TACAGCCGGG GACCCCG 197 

( 2 ) INFORMARON FOR SEQ ID N0:41: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 207 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE 

( A ) UBRARY: Hippocampus 
( B ) CLONE: 239990 

( jt i ) SEQUENCE DESCRIPTION: SEQ ID N0:41: 
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CCAAGATGCT GGAGGAACTC AAGCCGAGAC TTGTACCAAG GAGAGATCAG CAGGAAGGAG 60 

GCAGAGGGCT CTGAGAAAGA CGGGACTTCC TGGTCAGGAA GAGCACCACC AACCCGGGCT 120 

CCTTTTCCTC ACGGGCATGC ACAATGGCCA CGCAAGCACC TGCTGCTCTT GGACCCAGAA 180 

GGCACGTCCG GACAAAGGCA GAGTCTT 207 

( 2 ) [NFORMAnON FOR SEQ ID NO:42: 

( 1 ) SEQUENCE CHARACTERlSnCS: 
( A ) LENGTH; 195 base paira 
( B ) TYPE: nucleic ocid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Hippocampus 
( B ) CLONE: 240142 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

GTCACCGGAG AGGATCCATG AGAACGGCTA CAACTTCAAG TCCGACATCT GGTCCTTGGG 60 

CTGTCTGCTG TACGAGATGG CAGCCCTCCA GAGCCCCTTC TATGGAGATA AGATGAATCT 120 

TTCTCCCTGT GCCAGAAGAT CGAGCAGTGT GACTACCCCC CACTCCCCGG GGAGCACTAC 180 

TCCGAGAAGTTACGT 195 



( 2 ) INFORMAnON FOR SEQ ID NO:43: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 213 base pain 
( B ) TYPE: QDcIcic acid 
C C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Testes 
( B ) CLONE: 275781 

( )t i ) SEQUENCE DESCRIPTION: SEQ [D NO:43: 

CTCGTCTATT CGGCACGAGT TTCATTGTCG AACGAAATAT AAACTGTCTG GAAGATCTGG 60 

TGTAGCTCCT TCGAGACATC TTTGGCGATC AGCATCACCA ACGGTAAGAA GTGTAGTAAG 120 

CCAGATCTCA GGGCCAGGCA TCCCCAGTTG CTGTACAAGA GCAGGCTTTC AAGATGCTTC 180 

AAGGTCCCTG TCCATCAATA TGCTACACAT TTG 213 



( 2 ) INFORMAnON FOR SEQ W NO:44: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 425 base pairs 
( B ) TYPE: Qoclcic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Eosinopbils 
( B ) CLONE: 285465 

( X i ) SEQUENCE DESCRIPTION: SEQ ID N0:44: 

AAATACTTGA AGGAGTTTAT TATCTACATC AGAATAACAT TGTACACCTT GATTTAAAGC 60 

CACAGAATAT ATTACTGAGC AGCATATACC CTCTCGGGGA CATTAAAATA GTAGATTTTC 120 

GAATGTCTCG AAAAATAGGG CATGCGTGTG AACTTCGGGA AATCATGGGA ACACCAGAAT 180 
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ATTTACCTCC AGAAATCCTG AACTATGATC CCATTACCAC AGCAACAGAT ATGTGGAATA 240 

TTGGTATAAT AGCATATATG TTGTTAACTC ACACATCACC ATTTGTGGGA GAAGATAATC 300 

AAGAAACATA CCTCAATATC TCTCAAGTTA ATGTAGATTA TTCGGAAGGA ACTTTTTCAT 360 

CAGTTTCACA GCTGGCACAG ACTTTATTCA GAGCTTTTAG TAAAATCAGA GGAAAGGCCC 420 

ACAGC 4 2 5 

( 2 ) IhrFORMAnON FOR SEO W NO:45: 

( i ) SEQUENCE CHARACTERESnCS: 

( A ) LENGTH: 1851 base pain 
( B ) TYPE: nucleic acid 
( C ) STTRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( V i i ) IMMEDIATE SOURCE: 

( A ) UBRARY: Slomach 
( B ) CLONE: 214915E 

( 3t i ) SEQUENCE DESCRIPTION: SEQ ED NO:45: 

GCCCGTTGGG CCGCGAACGC AGCCGCCACG CCGGGGCCGC CGAGATCGGG TGCCCGGGAT 60 

GAGCCTCATC CGGAAAAAGG GCTTCTACAA GCAGGACGTC AACAAGACCG CCTGGGAGCT 120 

GCCCAAGACC TACGTGTCCC CGACGCACGT CGGCAGCGGG GCCTATGGCT CCGTGTGCTC 180 

GGCCATCGAC AAGCGGTCAG GGGAGAAGGT GCCCATCAAG AAGCTGAGCC GACCCTTTCA 240 

CTCCGAGATC TTCGCCAAGC GCGCCTACCC GGAGCTGCTG TTGCTGAAGC ACATGCAGCA 300 

TGAGAACGTC ATTGGGCTCC TGGATGTCTT CACCCCAGCC TCCTCCCTGG AACTTCTATG 360 

ACTTCTACCT GGTGATGCCC TTCATGCAGA CCGATCTGCA GAAGATCATG GGGATGGAGT 420 

TCAGTGAGGA GAAGATCCAG TACCTGGTGT ATCAGATGCT CAAAGGCCTT AAGTACATCC 480 

ACTCTGCTGG GGTCGTGCAC AGGGACCTGA AGCCAGGCAA CCTGGCTGTG AATGAGGACT 540 

GTGAACTGAA GATTCTGGAT TTGGGGCTGC CGCGACATGC AGACGCCGAG ATGACTGGCT 600 

ACGTGGTCAC CCGCTGGTAC CGAGCCCCCG AGGTGATCCT CAGCTGCATC CACTACAACC 660 

AGACAGTGGA CATCTGGTCT GTGGGCTGTA TCATGGCAGA GATGCTCACA GGGAAAACTC 720 

TGTTCAAGGG GAAAGATTAC CTGGACCAGC TGACCCAGAT CCTGAAAGTG ACCGGCGTGC 780 

CTGGCACGGA GTTTGTGCAG AAGCTGAACG ACAAAGCGGC CAAATCCTAC ATCCAGTCCC 840 

TGCCACAGAC CCCCAGGAAG GATTTCACTC AGCTGTTCCC ACGGGCCAGC CCCCAGCCTG 900 

CGGACCTGCT GGAGAAGATG CTGGAGCTAC ACGTGGACAA GCGCCTGACG GCCGCGCAGG 960 

CCCTCACCCA TCCCTTCTTT GAACCCTTCC GGGACCCTGA GGAAGAGACG GAGGCCCAGC 1020 

AGCCGTTTGA TGATTCCTTA GAACACGAGA AACTCACAGT GGATGAATGG AAGCAGCACA 1080 

TCTACAAGGA GATTGTGAAC TTCAGCCCCA TTGCCCGGAA GGACTCACGG CGCCGGAGTC 1140 

GCATGAAGCT GTAGGGACTC ATCTTGCATG GCACCGCCGG CCAGACACTG CCCAAGGACC 1200 

AGTATTTGTC ACTACCAAAC TCAGCCCTTC TTGGAATACA GCCTTTCAAG CAGAGGACAG 1260 

AAGGGTCCTT CTCCTTATGT GGGAAATGGG CCTAGTAGAT GCAGAATTCA AAGATGTCGG 1320 

TTGGGAGAAA CTAGCTCTGA TCCTAACAGG CCACGTTAAA CTGCCCATCT GGAGAATCGC 1380 

CTGCAGGTGG GGCCCTTTCC TTCCCGCCAG AGTGGGGCTG AGTGGGCGCT GAGCCAGGCC 1440 

GGGGGCCTAT GGCAGTGATG CTGTGTTGGT TTCCTAGGGA TGCTCTAACG AATTACCACA 1500 

AACCTGCTGG ATTGAAACAG CAGAACTTGA TTCCCTTACA GTTCTGGAGG CTGGAAATCT 1560 
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GGGATGGAGG 


TGTTGGCAGG 


GCTGTGGTCC 


CTTTGAAGGC 


TCTGGGGAAG 


AATCCTTCCT 


16 2 0 


TGGCTCTTTT 


TAGCTTGTGG 


CGGCACTGGC 


C AGTCCGTGC 


CATTCCCCAG 


CTTATTGCTG 


16 8 0 


CATCACTCCA 


GTCTCTGT CT 


CTTC TGTTCT 


CTCCTCTTTT 


AACAACACT C 


ATTGGATTT A 


17 4 0 


GGGCCCACCC 


TA ATCCTGTG 


TGATCTTATC 


TTCATCCTTA 


TTA ATTAAAC 


CTGCA AATAC 


18 0 0 


TCTAGTTCCA 


AA TAAAGT CA 


CATTCTCAGG 


T AAAAAAA AA 


AAAAAAAAAA 


A 


18 5 1 



We claim: 

1. A purified polynucleotide having a nucleic acid 
sequence selected from the group consisting of SEQ ID 
N0:1, SEQ 10 N0:2, SEQ ID N0:3, SEQ ID NO: 4, SEQ 
ID N0:5, SEQ ID N0:6, SEQ ID N0:7. SEQ ID N0:8, SEQ 
ID N0:9, SEQ ID NOrlO, SEQ ID N0:11, SEQ ID N0:12, 
SEQ ID N0:13, SEQ ID N0:14, SEQ ID N0:15, SEQ ID 
N0:16, SEQ ID N0:17, SEQ ID NO:18, SEQ ID N0:19, 20 
SEQ ID NO:20, SEQ ID N0:21, SEQ ID NO:22, SEQ ID 
NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, 
SEQ ID NO:27. SEQ ID NO:28, SEQ ID NO:29, SEQ ID 
NO:30, SEQ ID N0:31, SEQ ID NO:32, SEQ ID NO:33, 
SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID 



NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, 
SEQ ID N0:41, SEQ ID NO:42, SEQ ID NO:43, and SEQ 
ID NO:44. 

2. An expression vector comprising the polynucleotide of 
claim 1. 

3. A host cell transformed with the expression vector of 
claim 2. 

4. A method for producing and purifying a polypeptide, 
said method comprising the steps of: 

a) culturing the host cell of claim 3 under conditions 
suitable for the expression of the peptide; and 

b) recovering the polypeptide from the host cell culture. 

♦ 41 * * * 
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1 

SECRETED PROTEINS AND 
POLYNUCLEOTIDES ENCODING THEM 

FIELD OF THE INVENTION 

The present invention provides novel polynucleotides and 
proteins encoded by such polynucleotides* along with 
therapeutic, diagnostic and research utilities for these poly- 
nucleotides and proteins. 

BACKGROUND OF THE INVENTION 

Technology aimed at the discovery of protein factors 
(including e.g., cytokines, such as lyn^holdnes, interferons, 
CSFs and interleukins) has matured rapidly over the past 
decade. The now routine hybridization cloning and expres- 
sion cloning techniques clone novel polynucleotides 
"directly" in the sense that they rely on infarmation directly 
related to the discovered protein (i.e., partial DNA/amino 
acid sequence of the protein in the case of hybridization 
cloning; activity of the protein in the case of expression 
cloning). More recent "indirect" cloning techniques such as 
signal sequence doning, whidi isolates DNA sequences 
based on the presence of a now well-recognized secretory 
leader sequence motif, as well as various PCR-based or low 
stringency hybridization cloning techniques, have advanced 
the sUte of the axt by making available large numbers of 
DNA/amino acid sequences for proteins that are known to 
have biological activity by virtue of their secreted nature in 
the case of leader sequence cloning, or by virtue of the cell 
or tissue source in the case of PCR-based techniques. It is to 
these proteins and the polynucleotides encoding them that 
the present invention is directed. 

SUMMARY OF THE INVENTION 

In one embodiment, the present invention provides a 
composition con^irising an isolated polynucleotide selected 
from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence 
of SEQ ID N0:1; 

(b) a polynucleotide comprising the nucleotide sequence 
of SEQ ID N0:1 from nudeotide 247 to nudeotidc 
432; 

(c) a polynucleotide comprising the nudeotide sequence 
of SEQ ID N0:1 from nudeotide 328 to nucleotide 
432; 

(d) a polynudeotide comprising the nucleotide sequence 
of the full length protein coding sequence of done 
BD372^ deposited under accession number ATCC 
98146; 

(e) a polynudeotide encoding the full length protein 
encoded by the cDNAinsert of done BD372_^ depos- 
ited under accession number ATCC 98146; 

(f) a polynudeotide comprising the nudeotide sequence 
of the mature protein coding sequence of done 
BD372_J d^sited under accession number ATCC 
98146; 

(g) a polynudeotide encoding the mature protein encoded 
by the cDNAinsert done BD372__5 deposited under 
accession number ATCC 98146; 

(h) a polynudeotide encoding a protein comprising the 
amino add sequence of SEQ ID N0:2; 

(i) a polynudeotide encoding a protein comprising a 
fragment of the amino add sequence of SEQ ID N0:2 
having biological activity; 

(j) a polynudeotide which is an allelic variant of a 
polynudeotide of (aHg) above; 
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(k) a polynucleotide which encodes a spedes homdogue 

of the protein of (h) or (i) above. 
Preferably, such polynudeotide comprises the nucleotide 
sequence of SEQ ID N0:1 from nucleotide 247 to nude- 

3 otide 432; &e nudeotide sequence of SEQ ID N0:1 from 
nudeotide 328 to nucleotide 432; the nudeotide sequence of 
the full length protein coding sequence of done BD372_5 
deposited under accession number ATCC 98146; or the 
nudeotide sequence of the mature protein coding sequence 

10 of done BD372_J deposited under accession number 
ATCC 98146. In other preferred embodiments, the poly- 
nudeotide encodes the full lengtii or mature protein encoded 
by the cDNA insert of done BD372_^ deposited under 
accession number ATCC 98146. 

15 Other embodiments provide the gene corresponding to the 
cDNA sequence of SEQ ID N0:1 or SEQ ID N0:3. 

In other embodiments, the present invention provides a 
conq)osition coixqidsing a protdn, wherein said protein 
conq>rises an amino add sequence selected from the group 

20 consisting of: 

(a) the amino add sequence of SEQ ID N0:2; 

(b) fragments of the amino add sequence of SEQ ID 
N0:2; and 

(c) the amino add sequence encoded by the cDNAinsert 
^ of done 

BD372_5 deposited under accession number ATCC 
98146; the protein being substantially free from other mam- 
malian proteins. Preferably such protein comprises the 
amino add sequence of SEQ ID N0:2. 

In one embodiment, the present invention provides a 
composition conq)rising an isolated polynudeotide sdected 
from the group consisting of: 

(a) a polynudeotide comprising the nudeotide sequence 
35 ofSEQIDNO:4; 

(b) a polynudeotide con^xrising the nudeotide sequence 
of SEQ ID N0:4 from nudeotide 316 to nucleotide 
501; 

(c) a polynudeotide comprising the nudeotide sequence 
40 of the fiill lengdi protein, coding sequence of done 

BR533_4 dq)osited under accession number ATCC 
98146; 

(d) a polynucleotide encoding the full length protein 
encoded by tiie cDNAinsert of done BR533_4 depos- 

45 ited under accession number ATCC 98146; 

(e) a polynudeotide conqnising the nudeotide sequence 
of the mature protein coding sequence of done 
BR533__4 deposited under accession number ATCC 
98146; 

^ (f) a polynudeotide encoding the mature protein encoded 
by die cDNA insert of done BR533_4 deposited under 
accession number ATCC 98146; 

(g) a polynudeotide encoding a protein con^irising die 
amino add sequence of SEQ ID N0:5; 

(h) a polynudeotide encoding a protein comprising a 
fragment of the amino acid sequence of SEQ ID N0:5 
having biological activity; 

(i) a polynudeotide which is an aUelic variant of a 
60 polynudeotide of (aH<l) above; 

(j) a polynudeotide which encodes a spedes homologue 

of the protein of (g) or (h) above. 
Preferably, such polynudeotide comprises the nudeotide 
sequence of SEQ ID N0:4 from nudeotide 316 to nude- 
65 otide 50 1; the nudeotide sequence of the full length protein 
coding sequence of done BR533_4 deposited under acces- 
sion number ATCC 98146; or the nudeotide sequence of the 
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mature protein coding sequence of clone BR533_4 depos- Other embodiments provide the gene cOTresponding to the 

ited under accession number ATCC 98146. In other pre- cDNA sequence of SEQ ID N0:7. 

ferred embodiments, the polynucleotide encodes the full in other embodiments, the present invention jjrovidcs a 

length or naature protein encoded by the cDNA insert of conq)osition conqmsing a protein* wherein said protein 

clone BR533_4 deposited under accession number ATCC 5 comprises an amino acid sequence selected from the group 

consisting of: 

Other embodiments provide the gene canesponding to the , . ^ . . , - _ . ^ 

CDNA sequence of ID N0:4 or SEQ IDN0:6 ^ ^^'^'^ 

In other embodiments, the present invention provides a amino add sequence of SEQ ID N0:8 from amino 

conq)osition comprising a protein, wherein said protein ^ amino add 77; 

comprises an amino add sequence selected from the group (c) fragments of the amino add sequence of SEQ ID 

consisting of: NO: 8; and 

(a) the amino add sequence of SEQ ID N0:5; (d) the amino acid sequence encoded by the cDNA insert 

(b) fragments of the amino add sequence of SEQ ID of done 

N0:5; and 15 CC288_9 deposited under accession number ATCC 

(c) the amino add sequence encoded by the cDNA insert 98146; the protein being substantially free from other mam- 
of clone malian protdns. Preferably such protein comprises the 

BR533_4 deposited under accession number ATCC sequence of SEQ ID N0:8 or the amino add 

98146; the protein being substantially free from other mam- sequence of SEQ ID N0:8 from amino add 1 to amino add 
maiian proteins. Preferably such protein comprises the 20 

amino add sequence of SEQ ID N0:5. ^ certain preferred embodiments, the polynudeotide is 

In one embodiment, the present invention provides a operably linked to an expression control sequence. The 

composition comprising an isolated polynudeotide selected invention also provides a host cell, including bacterial, 

from the group consisting of: yeast, insect and mammalian cells, transformed with such 

(a) a polynucleotide comprising the nucleotide sequence 25 polynudeotide compositions. 

of SEQ ID N0:7; Processes are also provided for produdng a protein, 

(b) a polynudeotide comprising the nucleotide sequence comprise: 

of SEQ ID N0:7 from nudeotide 113 to nucleotide (^) growing a culture of the host ceU transformed with 

433; such polynucleotide compositions in a suitable culture 

(c) a polynucleotide comprising the nucleoUde sequence 3° medium; and 

of the full length protein coding sequence of done purifying the protein from the culture, 

CC288_9 dq)osited under accession number ATCC protein produced according to sudi methods is also 

98146; provided by the present invention. Preferred embodiments 

(d) a polynucleotide encoding the fuU length protein include those in which the protein produced by such process 
encoded by the cDNAinsert of clone CC288_9 depos- ^ mature form of the protein. 

ited under accession number ATCC 98146; Protein conq)ositions of the present invention may further 

(e) a polynucleotide comprising the nucleotide sequence ^"^^e a pharmaceuticaUy accq)table carrier. Composi- 
of the mature protein coding sequence of clone tions compnsmg an antibody whidi spedfically reacts with 
CC288_9 deposited under accession number ATCC P'°*^ Fovided by the present invention. 
9gl4^. 40 Methods are also provided for preventing, treating or 

(f) a polynucleotide encoding the mature protdn encoded ^l^^f^g ^ ^^^^^ ff' 
by toe cDNAinsert of clone Ca88 9 deposited under ^ mammahan subjert a therapeuticaUy effective 
accession number ATCC 98146- " amountof a composition compnsmg a protem of the present 

, , . , mvention and a phannaceutically accojtable carrier. 

(g) a polynucleotide encoding a protem compnsmg the 45 

amino acid sequence of SEQ ID N0:8; DETAILED DESCRIPTION 

(h) a polynucleotide encoding a protein comprising a 

fragment of the amino add sequence of SEQ ID N0:8 ISOLATED PROTEINS AND 

having biological activity; POLYNUQ£0TIDES 

(i) a polynucleotide which is an allelic variant of a 50 Nudeotide and amino add sequences are reported bdow 
polynucleotide of (a)-(d) above; for eadi done and protein dlsdosed in the present applica- 

Q) a polynudeotide which encodes a spedes homologue tion. In some instances the sequences are preliminary and 

of the protein of (g) or (h) above, may indude some incorrect or ambiguous bases or amino 

Preferably, sudi polynudeotide comprises the nudeotide adds. The actual nucleotide sequence of each clone can 

sequence of SEQ D N0:7 from nudeotide 113 to nudeotide S5 readily be determined by sequencing of the deposited done 

433; the nudeotide sequence of the fiiU length protein in accordance with known methods. The predicted amino 

coding sequence of done CC288_9 deposited under acces- add sequence (both fuU length and mature) can dien be 

sion number ATCC 98164; or the nudeotide sequence of flie determined from such nudeotide sequence, llie amino add 

mature protdn coding sequence of clone CC288_9 depos- sequence of the protdn encoded by a particular clone can 

ited under accession number ATCC 98146. In other pre- 60 also be determined by expression of the done in a suitable 

ferred embodiments, the polynucleotide encodes the full host cell, collecting the protein and determining its 

length or mature protein encoded by the cDNA insert of sequence. 

done CC288_9 deposited under accession number ATCC For each disdosed protein aj^licants have identified what 

98146. In yet other prefened embodiments, the present they have detcnnined to be the reading frame best identifi- 

invention provides a polynudeotide encoding a protein 65 able with sequence information available at the time of 

comprising the amino add sequence of SEQ ID N0:8 from filing. Because of the partial ambiguity in reported sequence 

amino add 1 to amino add 77. information, reported protdn sequences indude **Xaa*' des- 
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ignatars. These '^Xaa" designators indicate either (1) a BLASTX and FASTA search protocols. BR533_4 demon- 
residue which cannot be identified because of nucleotide strated at least some homology with murine semaphcrin E 
sequence ambiguity or (2) a stop codon in the detennined (X85994, BlastN). BRS33_4 also shows at least some 
nucleotide sequence where applicants believe one should not identity with an EST identified as "yySOdlO.s 1 Homo 
exist (if the nucleotide sequence were determined more 5 sapiens cDNA done 279859 3'" (N38844, BlastN). Based 
accurately). upon homology, BR533_4 proteins and each homologous 

As used herein a "secreted" protein is one which, when protein or peptide may share at least some activity, 

expressed in a suitable host cdl, is transported across cr Qonc **CC288 9" 

through a membrane, including transport as a result of signal A polynucleotide of the present invention has been iden- 

sequences in its amino add sequence. "Secreted" proteins tified as clone "CC288_9". CC288_9 was isolated from a 

indude without limitation proteins secreted wholly (e.g., human adult brain cDNA library using methods whidi are 

soluble proteins) or partially (e.g.. receptors) from the cell in selective for cDNAs encoding secreted proteins. CC288_9 

which they are expressed. **SeCTeted" proteins also indude is a full-length clone, including the entire coding sequence 

without limitation protdns which are transported aaoss the of a secreted protdn (also referred to herein as "CC288_9 

membrane of the endoplasmic reticulum. protein"). 

Gone "BD372 5" The nudeotide sequence of CC288_9 as presently deter- 
A polynucleotide of the present invention has been iden- mined is reported in SEQ ID N0:7. What applicants pres- 
tified as clone *'BD372_5". BD372_5 was isolated from a ently believe to be the proper reading frame and the pre- 
human fetal kidney cDNA library using methods wtiicfa are dieted amino add sequence of the CC288_9 protein 
selective for cDNAs encoding secreted proteins. BD372_^ corresponding to the foregoing nucleotide sequence is 
is a full-length done, induding the entire coding sequence 20 reported in SEQ ID NO: 8. 

of a secreted {:70tein (also lefeired to herein as "BD372_^ The nudeotide sequence disdosed herein for CC288_9 

protein"). was searched against the GenBank database using BLASTA/ 

The nudeotide sequence of the 5' portion of BD372 _5 as BLASTX and R\STA search protocols. No hits were found 

presently detennined is rq)orted in SEQ ID N0:1. What the database, 

applicants presently believe is the proper reading frame for 25 I^cposit of Oones 

the coding region is indicated in SEQ ID N0:2. The pre- Qones BD372__5, BR533_4 and CC288_9 were depos- 

dictedaddsequenceoftheBD372^proteincorresponding ited on Aug. 22, 1996 with the American TNj^c C^^ 

to the foregoing nucleotide sequence is reported in SEQ ID *^?!1^°!V accession number ATCC 98146, from 

N0:2. Amino adds 1 to 27 are the predi^ leader/signal ^^"^^ composing a pamcukr polyiiudeoUde is 

sequence, with the predicted mamre amino add sequence 30 ^T^^'^^^^ " "^ff^ .f ^ TT^ 

beiinning at amino ^d 28. Additional nudeotide sequence f^^'^"^ composite deposit Each clone 

froTnthe 3' portion of BD372_^, induding the polyAtail, is removed from the vector mwhidi it was d^ositedby 

reported in SEQ ID m s pcrformmg an EcoRI/Nofl digestion (5' cite, EcoRI; 3' cite, 

The EcoRI/Noa restriction fragment obtainable from the ^otl) to produce the appropriatdy sized fragment for such 

deposit containing done BD372_J should be approxi- 35 clone (approximate done size fragmen are identified 

mately 2300 bp below). Bactenal cells containmg a particular done can be 

He nucleotide sequence disclosed herdn for BD372^ obtained from the coinpoMte deposit as foll^^^^^ 

wassearchedagainAcGenBankdatabaseusingBLACTA/ An ohgonudeotideFobeOT probes sh«dd be designed to 

BLASTX and FACTA seardi protocols. BD37215 demon- '^"'^'^ "^T^ ^ ^ ^ 

strated at least some identity with ECTs identified as 40 f 9"!=°** ^ ^"f^ ^ the sequences provided 

"yc90fl2.s lHomosapienscDNAclone232783"(R39276, a combination of those sequences The 

BlastN) and "ECT05537 Homo sapiens cDNA clone 'Tf"*^ « n", "^g^^'^^"^'!^ P^^^ *f ^'^ ^ 

uiTDDxvri^" rrnnAAn c * \ u j isolate each fuIl-length done IS identified bdow, and should 

HFBEM26 (T07647, Fasta). Based upon identity, , * v ui • • i i r - * L 

TirwoTi c * • J u -J ^- 1 * • ^1 be most reliable in isdatmg the clone of mterest 

BD372_J protems and each identical protein or peptide ^ 

may share at least some activity. as 
Qone "BR533 4" 

Apolynudeotide of the present invention has been iden- 
tified as clone "BR533_4". BR533_4 was isolated from a 
human fetal kidney cDNA libraiy using methods whidi are 

selective for cDNAs encoding secreted proteins. BR533_4 50 
is a full-length done, including the entire coding sequence 

of a secreted protein (also referred to hcrdn as "BR533_4 Id the sequences listed above which indude an N at position 

protdn"). ^» ^t position is occupied in preferred probes/primers by a 

The nudeotide sequence of the 5' portion of BR533_4 as biotinylated phosphoaramidite residue rather than a nude- 

presently determined is reported in SEQ ID N0:4. What 55 otide (such as, for exanq)le, that jttoduced by use of biotin 

applicants presently bdieve is the prcper reading frame for phosphoramidite (l-dimethoxytrityloxy-2-(N-biotinyl-4- 

the coding region is indicated in SEQ ID N0:5. The pre- amin6butyl)-propyl-3-0-(2-cyanoethyl)-(N,N-diisopropyl)- 

dicted add sequence of the BR533_4 protein corresponding phosphcffamadite) (Glen Research, cat no. 10-1953)). 

to the foregoing nudeotide sequence is reported in SEQ ID The design of the oligonudeotide probe should pr^erahly 

N0:5. Additional nudeotide sequence from the 3* portion of 60 foUow these parameters: 

BRS33_4, induding the polyA tail, is reported in SEQ ID (a) It should be designed to an area of the sequence which 

N0:6. has the fewest ambiguous bases ("N's**), if any; 

The EcoRI/NotI restriction fragment obtainable from the (b) It should be designed to have a T„ of approx. 80° C 

deposit containing done BR533.4 should be ^pioximatdy (assuming 2^ for each A or T and 4 degrees for each G 

2850 bp. 65 or Q. 

The nudeotide sequence disclosed herein for BR533_4 The oligonudeotide should preferably be labeled with g-^^ 

was searched against the GenBank database using BLASTA/ ATP (specific activity 6000 Ci/mmde) and T4 polynude- 
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Ba>37Z_5 


SEQ ID NO: 9 


BR533_4 


SEQ ID NO: 10 


CC288_9 


SEQ ID NO: 11 
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odde kinase using commonly employed techniques for 
labeling oligonucleotides. Other labeling techniques can 
also be used. Unincoiporated label should preferably be 
removed by gel filtration chromatography or other estab- 
lished methods. The amount of radioactivity incorporated 
into the probe should be quantitated by measurement in a 
scintillation counter. Preferably, specific activity of the 
resulting probe should be approximately 4ef6 dn^)^mole. 

The bacterial culture containing the pool of full-length 
clones should preferably be thawed and 100 pi of the stock 
used to inoculate a sterile culture flask containing 25 ml of 
sterile L-broth containing an^)icillin at 100 pg^mL The 
culture should preferably be grown to saturation at 37** C, 
and the saturated cultiure should preferably be diluted in 
fresh L-broth. Aliquots of these dQutions should preferably 
be plated to detennine the dilution and volume which will 
yield approximately 5000 distinct and well-separated colo- 
nies on solid bacteriological media containing L-broth con- 
taining ampldllin at 100 ^g/ml and agar at 1.5% in a 150 
mm petri dish when grown overnight at 37** C. Other known 
methods of obtaining distinct well-separated colonies can 
also be employed. 

Standard colony hybridization procedures should then be 
used to transfer the colonies to nitrocellulose filters and lyse, 
denature and bake them. 

The filter is then preferably incubated at 65° C. for 1 hour 
with gentle agitation in 6x SSC (20x stock is 175.3 g 
NaO/liter, 88.2 g Na dtrate/liter, adjusted to pH 7,0 with 
NaOH) containing 0.5% SDS, 100 pg/ml of yeast RNA, and 
10 mM EDTA (approximately 10 mL per 150 mm filter). 
Preferably, the probe is then added to the hybridization mix 
at a concentration greater than or equal to le+6 dpm/mL. 
The filter is then preferably incubated at 65° C. with gentle 
agitation overnight. The filter is then preferably washed in 
500 mL of 2x SSC/0.5% SDS at room temperature without 
agitation, preferably followed by 500 mL of 2x SSC/0.1% 
SDS at room temperature with gentle shaking for 15 min- 
utes. A third wash with O.lx SSCyO.5% SDS at 65** C. for 30 
minutes to 1 hour is optional. The filter is then preferably 
dried and subjected to autoradiography for sufficient time to 
visualize the positives on the X-ray film. Other known 
hybridization methods can also be employed. 

The positive colonies are picked, grown in culture, and 
plasmid DNA isolated using standard procedures. The 
clones can then be verified by restriction analysis, hybrid- 
ization analysis, or DNA sequencing. 

Fragments of the proteins of the present invention which 
are capable of exhibiting biological activity are also encom- 
passed by the present invention, fragments of the protein 
may be in linear form or they may be cyclized using known 
methods, for example, as described in H. U. Saragovi, et aL, 
Bio/Technology 10, 773-778 (1992) and in R. S. McDowell, 
et al., J. Amer. Chem. Soc. 114, 9245-9253 (1992), both of 
which are incorporated herein by reference. Such fi:agments 
may be fused to carrier molecules such as immunoglobulins 
for many purposes, including increasing the valency of 
protein binding sites. For exan^le, fragments of the protein 
may be fused tiux)ugji "linker" sequences to the Fc portion 
of an immunoglobulin. For a bivalent form of the protein, 
such a fiision could be to the Fc portion of an IgG molecule. 
Other immunoglc^ulin isotypes may also be used to gener- 
ate such fusions. For example, a protein — IgM fusion would 
generate a decavalent form of the protein of the invention. 

The present invention also provides both full-length and 
mature forms of the disclosed proteins. The fuU-length fonn 
of the such proteins is identified in the sequence listing by 
translation of the nucleotide sequence of each disclosed 



,173 

8 

clone. The mature form of such protein may be obtained by 
expression of the disclosed fuU-lcngth polynucleotide 
(preferably those deposited with ATCQ in a suiuble mam- 
malian cell or other host cell The sequence of the mature 

5 form of the protein may also be determinable from the amino 
add sequence of the full-length form. 

The present invention also provides genes corresponding 
to the cDNA sequences disclosed herein. The corresponding 
genes can be isolated in accordance with known methods 

10 using the sequence information disclosed herein. Such meth- 
ods include the preparation of probes or primers from the 
disclosed sequence information for identification and/or 
amplification of genes in appropriate genomic libraries or 
other sources of genomic materials. 

15 Where the protein of the present invention is membrane- 
bound (e.g., is a recq)t6r), the present invention also pro- 
vides for soluble forms of such protein. In such forms part 
or all of the intracellular and transmembrane domains of the 
protein are deleted such that the protein is fully secreted 

20 from the cell in which it is expressed. The intracellular and 
transmembrane domains of proteins of the invention can be 
identified in accordance with known techniques for deter- 
mination of such domains from sequence information. 
Species homologs of the disclosed polynucleotides and 

25 proteins are also provided by the present invention. Species 
homologs may be isolated and identified by making suitable 
probes or primers from the sequences provided herein and 
screening a suitable nucleic acid source from the desired 
species. 

30 The invention also encompasses allelic variants of the 
disclosed polynucleotides or proteins; that is, naturally- 
occurring alternative forms of the isolated polynucleotide 
which also encode proteins which are identical, homologous 
or related to that encoded by the polynucleotides. 

35 The isolated polynucleotide of the invention may be 
operably linked to an expression control sequence such as 
the pMT2 or pHD expression vectors disclosed in Kaufman 
et al,, Nucleic Adds Res. 19, 4485-4490 (1991), m order to 
produce the protein recombinantly. Many suitable expres- 

40 sion control sequences are known in the art General meth- 
ods of expressing recombinant proteins are also known and 
are exenc^lified in R. Kaufrnan, Methods in Enzymology 
185, 537-566 (1990). As defined herein "operably linked" 
means that the isolated polynucleotide of the invention and 

45 an expression control sequence are situated within a vector 
or cell in such a way that the protein is expressed by a host 
cell which has been transformed (transfected) with the 
ligated polynudeotide/expression control sequence. 
A number of types of cdls may act as suitable host cells 

50 for expression of the protein. Mammalian host cells include, 
for exanq)le, monkey COS cells, Chinese Hamster Ovary 
(CHO) cells, human kidney 293 cells, human epidermal 
A431 cdls, human Colo205 cells, 3T3 cells, CV-1 ceUs, 
other transformed primate cell lines, normal diploid cells, 

55 cell strains derived from in vitro culture of primary tissue, 
primary explants, HeLa cells, mouse L cells, BHK, H1^60, 
U937, HaK or Jurkat cells. 

Altemativdy, it may be possible to produce the protein in 
lower eukaryotes such as yeast or in prokaryotes such as 

60 bacteria. Potentially suitable yeast strains indude Saccha- 
romyces cerevisiae, Schizosaccharomyces pomhe, 
Kluyveromyces strains, Candida, or any yeast strain capable 
of expressing heterologous proteins. Potentially suitable 
bacterial strains indude Escherichia coU, Bacillus subtiUs, 

65 Salmonella typhimurium, or any bacterial strain capable of 
expressing heterologous protdns. If the protein is made in 
yeast or bacteria, it may be necessary to modify the protein 



5,6; 

9 

produced therein, for exanq)lc by phosphorylation or gly- 
cosylation of the ^lopriate sites, in order to obtain the 
functional protein. Such covalent attachments may be 
accomplished using known diemical or enzymatic m^ods. 

The protein may also be produced by opeiably linking the 
isolated polynucleotide of the invention to suitable control 
sequences in one or more insect expression vectars^ and 
enq}loying an insect expression system. Materials and meth- 
ods for baculovinis/insect cell expression systems are com- 
mercially available in Idt form firom, e.g., Invitrogen, San 
Diego, Calif., U.SA. (the MaxBat® kit), and such me&ods 
are well known in the ait, as described in Summers and 
Smith, Texas Agricultural Experiment Station Bulletin No. 
1555 (1987), incorporated herein by reference. As used 
herein, an insect cell capable of expressing a polynucleotide 
of the present invention is •'transfonned." 

The protein of the invention may be prepared by culturing 
transformed host cells under culture conditions suitable to 
express the recombinant protein. The resulting expressed 
protein may then be purified from such culture (i.e., from 
culture medium or cell extracts) using known purification 
processes, such as gel filtration and ion exchange chroma- 
tography. The purification of the protein may also include an 
affinity column containing agents which will bind to the 
protein; one or more column steps over such aflSnity resins 
as concanavalin A-agarose, heparin-toyopearl® or Cibac- 
rom blue 3GA Sepharose®; one or more steps involving 
hydrophobic interaction chromatograi^y using such resins 
as phenyl ether, butyl ether, or propyl ether; or immunoaf- 
fiidty chromatography. 

Alternatively, the protein of the invention may also be 
expressed in a form which will facilitate purification. For 
example, it may be expressed as a fusion protein, such as 
those of maltose binding protein (MBP), glutathione-S- 
transferase (GST) or thioredoxin (TRX). Kits for expression 
and purification of such fusion proteins are commercially 
available from New England BioLab (Beverly, Mass.), Phar- 
macia (Piscataway, N J.) and In Vitrogen, respectively. The 
protem can also be tagged with an epitope and subsequently 
purified by using a specific antibody directed to such 
epitope. One such epitope ("Tlagi") is commercially avail- 
able from Kodak (New Haven, Conn.), 

Finally, one or more reverse-phase high performance 
liquid chromatogr^hy (RP-HPLC) steps employing hydro- 
jAobic RP-HPLC media, e.g., silica gel having pendant 
methyl or other aliphatic groups, can be en^iloyed to further 
puriiy the protein. Some or all of the foregoing purification 
steps, in various combinations, can also be employed to 
provide a substantially homogeneous isolated recombinant 
protein. The protein thus purified is substantially free of 
oithci mammalian proteins and is defined in accordance with 
the present invention as an 'Isolated protein." 

The protein of the invention may also be expressed as a 
product of transgenic animals, e.g., as a component of the 
milk of transgenic cows, goats, pigs, or sheep whidi arc 
characterized by somatic or germ cells containing a nucle- 
otide sequence encoding the protein. 

The protein may also be produced by known conventional 
chemical synthesis. Methods for constructing the proteins of 
the present invention by synthetic means are known to those 
skilled in the art The synthetically-constructed protein 
sequences, by virtue of sharing primary, secondary or ter- 
tiary structural and/or conformational diaracteristics with 
proteins may possess biological properties in common 
therewith, including protein activity. Thus, they may be 
employed as biologicaliy active or irmimnological substi- 
tutes for natural, purified proteins in soreening of tfaen^utic 
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conqyounds and in immunological processes for the devel- 
opment of antibodies. 

The proteins provided herein also include proteins char- 
acterized by amino add sequences similar to those of 

5 purified proteins but into which modification are naturally 
provided or deliberately engineered. For exan^le, modifi- 
cations in the p€|>tide or DNA sequences can be made by 
those skilled in the art using known techniques. Modifica- 
tions of interest in the protein sequences may include the 

10 alteration, substitution, rq}lacement, insertion or deletion of 
a selected amino add residue in the coding sequence. For 
example, one or more of the cysteine residues may be 
deleted or replaced with another amino add to alter the 
conformation of the molecule. Techniques for such 

15 alteration, substitution, replacement, insertion or deletion 
are well known to those skilled in the art (see, eg., U.S. Pat 
No. 4,518,584). E^eferably, such alteration, substitution, 
replacement insertion or deletion retains the desired activity 
of the protdit 

20 Other fragments and dmvatives of the sequences of 
proteins which would be expected to retain protein activity 
in whole or in part and may tfius be useful for screening or 
other immunological methodologies may also be easily 
made by those skilled in tiie art given the disdosures herein. 

25 Such modifications are believed to be encon^sed by the 
present invention. 

USES AND BIOLOGICAL ACTtViry 

The polynucleotides and proteins of the present invention 

30 are expected to exhibit one or more of the uses or biological 
activities (including tiiose associated with assays dted 
herein) identified bdow. Uses or activities described for 
proteins of the present invention may be provided by admin- 
istration or use of such proteins or by administration or use 

35 of polynucleotides encoding such proteins (such as, for 
exanq>le, in gene therapies or vectors suitable for introduc- 
tion of DNA). 
Research Uses and Utilities 
The polynudeotides provided by the present invention 

40 can be used by the research community for various purposes. 
The polynudeotides can be used to express recombinant 
protein for analysis, characterization or therapeutic use; as 
markers for tissues in which the corresponding protein is 
prefa-entially expressed (either constimtively or at a par- 

45 ticular stage of tissue differentiation or development or in 
disease states); as molecular weight markers on Soudiem 
geb; as chromosome markers or tags (when labeled) to 
identify chromosomes or to map related gene positions; to 
con^>are with endogenous DNA sequences in patients to 

50 identify potential genetic disorders; as probes to hybridize 
and thus discover novel, rdated DNA sequences; as a source 
of information to derive PCR primers for genetic finger- 
printing; as a probe to "subtract-out** known sequences in 
the process of discovering other novd polynudeotides; for 

55 sdecting and making oligomers for attadmient to a "gene 
chip** (H* other support, induding for examination of expres- 
sion patterns; to raise anti-protdn antibodies using DNA 
immunization techniques; and as an antigen to raise anti- 
DNA antibodies or elidt another immune response. Where 

60 the polynudeotide encodes a protein which binds or poten- 
tially binds to another protein (such as, for example, in a 
receptor-ligand interaction), the polynudeotide can also be 
used in interaction trap assays (such as, for example, diiat 
described m Gyuris et aL, C:ell 75:791-803 (1993)) to 

65 identify polynudeotides encoding the oth^ protein with 
which binding occurs or to identify inhibitors of the binding 
interaction. 
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The proteins fH-ovidcd by the presem invention can simi- noL 145:1706-1712, 1990; Bertagnolii ct aL, Cellular 

larly be used in assay to deteraune biological activity, Immunology 133327-341, 1991; Bertagnolii, et al,, L 

including in a panel of multiple proteins for hig^i-throughput ImmunoL 1493778-3783. 1992; Bowman et al., L Immu- 

screening; to raise antibodies or to elicit another immune noL 152:1756-1761, 1994. 

response; as a reagent (including the labeled reagent) in 5 Assays for cytokine production and/or proliferation of 
assays designed to quantitatively determine levels of the spleen cells, lynq^h node cells or thymocytes indude, with- 
protein (or its receptor) in biological fluids; as markers for out limitation, those described in: Polyclonal T cell 
tissues in which the corresponding protein is preferentially stimulation, Kniisbeek, A. M. and Shevach, R M. Id Cur- 
expressed (either constitutivdy or at a particular stage of rent Protocols in Immunology. J. E. e^a. Coligan eds. Vol 1 
tissue differentiation or development or in a disease state); lo pp. 3.12.1-3.12.14, John Wiley and Sons, Toronto. 1994; 
and, of course, to isolate correlative recq)tors or ligands. and Measurement of mouse and human interleukln y, 
Where the protein binds or potentially binds to another Schreiber, R. D. In Current Protocols in Immunology. J. E. 
protein (such as, for example, in a receptor-ligand e.a.Coliganeds. Vol 1pp. 6.8.1-6.8.8. John Wiley and Sons, 
interaction), the protein can be used to identify the other Toronto. 1994. 

protein with which binding occurs or to identify inhibitors of 15 Assays for proliferation and differentiation of hematopoi- 

the bindiag interaction. Proteins involved in these binding etic and lymphopoietic cells include, without limitation, 

interactions can also be used to screen for peptide or small those described in: Measurement of Human and Murine 

molecule inhibitors or agonists of the binding interaction. IntedeoMn 2 and Interleukin 4, Bottomly, K.. Davis, L. S. 

Any or all of these research utilities are capable of being and Lipsky, P. E. In Current Protocols in Immunology. J. E. 

developed into reagent grade or kit format for commercial- 20 e.a. Coligan eds. Vol 1 pp. 6.3.1-6.3.12, John Wiley and 

ization as research products. Sons, Toronto. 1991; deVries et al., J. Exp. Med. 

Methods for performing the uses listed above are well 173:1205-1211, 1991; Moreau et al.. Nature 336:690-692, 

known to those skilled in the art References disdosing such 1988; Greenbergcr et al., Proc. NatL Acad. Sd. U.S.A. 

methods include without limitation **Molecular Qoning: A 80:2931-2938, 1983; Measurement of mouse and human 

Laboratory Manual**, 2d ed,, Cold Siaing Harbor Laboratoiy 25 interleukin 6 — ^Nordan, R. In Current Protocols in Immu- 

Press, Sambrook, J., E. F. Fritsch andT. Maniatis eds,, 1989, nology. J. E, e.a. Coligan eds. Vol 1 pp. 6.6.1-6.6.5, John 

and 'Methods in Enzymology: Guide to Molecular Cloning Wiley and Sons, Toronto. 1991; Smith et al., Proc. Natl. 

Techniques", Academic Press, Berger, S. L, and A. R. Aced. ScL U.S.A. 83:1857-1861, 1986; Measurement of 

Kimmel eds., 1987. human Interleukin 11 — ^Bennett, F., Giannotti, J., Clark, S. 

Nutritional Uses 30 C. and Huner, K. J. In Current Protocols in Immunology. J. 

Polynudeotides and proteins of the present invention can E. e.a. Coligan eds. Vol 1 pp. 6.15.1 John ^^ey and Sons, 

also be used as nutritional sources or supplements. Such uses Toronto. 1991; Measurement of mouse and human Interleu- 

indude without limitation use as a protein or amino acid kin 9 — Ciarletta, A., Giannotti, J., Clark, S. C. and Turner, 

supplement, use as a carbon source, use as a nitrogen source K. J. In Current Protocols in Immunology, J. E. e.a. Coligan 

and use as a source of carbohydrate. In such cases the protein 3S eds. Vol 1 pp. 6.13. 1, John Wiley and Sons, Toronto. 1991. 

or polynudeotide of the invention can be added to the feed Assays for T-cell clone responses to antigens (which will 

of a particular organism or can be administered as a separate identify, among others, proteins that affect APC-T cell 

solid or liquid preparation, such as in the form of powder, interactions as well as direct T-cell effects by measuring 

pills, solutions, suspensions or capsules. In the case of proliferation and cytokine production) indude, without 

microorganisms, the protdn or polynudeotide of the inven- 40 limitation, those described in: Current Protocols in 

tion can be added to the medium in or on which the Immunology, Ed by J. E. Coligan, A. M. ICruisbeek, D. H. 

microorganism is cultured. Margulies, E. M. Shevach, W Strober, Pub. Greene Publish- 

Cytokine and Cell Proliferation/Differentiation Activity ing Associates and Wiley-Intersdence (Chapter 3, In Vitro 

A protein of the present invention may exhibit cytokine, assays for Mouse Lymphocyte Function; Chapter 6, Cytok- 

cell proliferation (either inducing or inhibiting) or cell 4S ines and their cellular receptors; Chapter 7, Immunologic 

differentiation (either indudng or inhibiting) activity or may studies in Humans); Weinberger et al., Proc. Natl. Acad. Sci. 

induce production of other cytokines in certain cell popu- USA 77:609 1-6095, 1980; Wdnberga: et al., Eur. J. Immun. 

lations. Many protein factors discova-ed to date, induding 11:405-411, 1981;Takai et aL, J. ImmunoL 137:3494-3500, 

all known cytokines, have exhibited activity in one or more 1986; Takai et al., J. ImmunoL 140:508-512, 1988. 

factor dependent cell proliferation assays, and hence the so Inmmne Stimulating or Suppressing Activity 

assays serve as a convenient confirmation of cytokine activ- A protein of the present invention may also exhibit 

ity. The activity of a protein of the present invention is immune stimulating or inmiune suppressing activity, indud- 

evidenced by any one of a number of routine factor depen- ing without limitation the activities for which assays are 

dent cell proliferation assays for cell lines including, without described herein. A protein may be useful in the treatment of 

limitation, 32D, DA2, DAIG, TIO, B9, B9/11, BaF3, MC9/ 55 various immune deficiendes and disorders (induding severe 

G, M+(preB M+), 2E8, RB5, DAI, 123, T1165, HT2, combined immunodefidency (SOD)), e.g., in regulating (up 

CrLL2, TF-1, Mo7e and CMK. or down) growth and proliferation of T and/or B 

The activity of a protein of the invention may, among lymphocytes, as well as effecting the cytolytic activity of 

other means, be measured by the following methods: NK cells and other cell populations. These immune defi- 

Assays for T-cell or thymocyte proliferation indude with- 60 dcndes may be genetic or be caused by vital (e.g., HIV) as 

CHit limitation those described in: Current Protocols in well as bacterial or fungal infections, or may result from 

lomumology, Ed by J. E Coligan, A. M. Kruisbeek, D. H. autoimmune disorders. More specifically, infectious dis- 

Margulies, E. M. Shevach. W. Sfrobcr, Pub. Greene Pub- eases causes by viral, bacterial, fungal or other infection 

lishing Associates and Wiley-Intasdence (Chapter 3, In may be treatable using a protein of the present invention. 

Vitro assays for Mouse Lymphocyte Function 3.1-3.19; 65 indudinginfections by HIV, hepatitis viruses, herpesviruses, 

Chapter 7, Immunologic studies in Humans); Takai et aL, J. mycobacteria, Ldshxnania spp., malaria spp. and various 

LnmunoL 137:3494-3500, 1986; BeitagnollietaL, J. Imniu- fungal infections such as candidiasis. Of course, in this 
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regard, a protein of the present invention may also be useful 
where a boost to the ininmne system generally may be 
desirable, Le., in the treatment of cancer. 

Autoimmune disorders which may be treated using a 
protdn of the present invention include, for example, con- 
nective tissue disease, multiple sderosis, systemic lupus 
erythematosus, rheumatoid arthritis, autoimmune pulmo- 
nary inflammation, Guillain-Barre syndrome, autoimmune 
thyroiditis, insulin dependent diabetes mellitis, myasthenia 
gravis, graft-versus-host disease and autoimmune inflam- 
matory eye disease. Such a protein of the present invention 
may also to be useful in the treatment of allergic reactions 
and conditions, such as asthma (particularly allergic asthma) 
or other respiratory problems. Other conditions, in which 
inmiune suppression is desired (including, for example, 
organ transplantation), may also be treatable using a protein 
of the present invention. 

Using the proteins of the invention it may also be possible 
to immune responses, in a number of ways. Down regulation 
may be in the form of inhibiting or bloddng an immune 
response already in progress or may involve preventing the 
induction of an immune response. Ilie functions of activated 
T cells may be inhibited by suppressing T cell responses or 
by inducing speciflc tolerance in T cells, or both. Immuno- 
suppression of T cell responses is generally an active, 
non-antigen-speciflc, process which requires continuous 
exposure of the T cells to the suppressive agent. Tolerance, 
which involves inducing non-responsiveness or anergy in T 
cells, is distinguishable from immunosuppression in that it is 
generally antigen-specific and persists after exposure to the 
tolerizing agent has ceased. Operationally, tolerance can be 
demonstrated by the lack of a T cell response upon reexpo- 
sure to specific antigen in the absence of the tolerizing agent 

Down regulating or preventing one or more antigen 
fiinctions (including without limitation B lymphocyte anti- 
gen functions (such as, for example, B7)), e.g., preventing 
high level lymphokine synthesis by activated T cells, will be 
useful in situations of tissue, skin and organ transplantation 
and in graft-versus-host disease (GVHD). For example, 
blockage of T cell function should result in reduced tissue 
destruction in tissue transplantation, lypically, in tissue 
transplants, rejection of the transplant is initiated through its 
recognition as foreign by T celb, followed by an immune 
reaction that destroys the transplant. The administration of a 
molecule which inhibits or blocks interaction of a B7 
lymphocyte antigen with its natural ligand(s) on immune 
cells (such as a soluble, monomeric form of a peptide having 
B7-2 activity alone or in conjunction with a monomeric 
form of a peptide having an activity of another B lympho- 
cyte antigen (e.g., B7-1, B7-3) or blocking antibody), prior 
to transplantation can lead to the binding of the molecule to 
the natural ligand(s) on the immune cells without transmit- 
ting the corresponding costimulatory signal Blocking B 
lymphocyte antigen function in this matter prevents cytokine 
synthesis by immune cells, such as T cells, and thus acts as 
an immunosuppressant Moreover, the lack of costimulation 
may also be suf&dent to anergize the T cells, thereby 
inducing tolerance in a subject Induction of long-term 
tolerance by B lymphocyte antigen-blocking reagents may 
avoid the necessity of repeated administration of these 
blocking reagents. To achieve sufficient inununosuppression 
or tolerance in a subject, it may also be necessary to block 
the function of a combination of B lymphocyte antigens. 

The efficacy of particular blocking reagents in preventing 
crgan transplant rejection or GVHD can be assessed using 
animal models that are predictive of efficacy in humans. 
Examples of expropriate systems which can be used include 
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allogeneic cardiac grafts in rats and xenogeneic pancreatic 
islet cell grafts in mice, both of which have been used to 
examine fee immunosuppressive effects of CrLA4Ig fusion 
proteins in vivo as described in Lenschow et aL, Science 

5 257:789-792 (1992) and Ttoka et al., Proc. NatL Acad. Sci 
USA, 89:11102-11105 (1992). In addition, murine models 
of GVHD (see Paul ed. Fundamental Immunology, Raven 
Press, New York, 1989, pp. 846-847) can be used to 
determine the effect of blocking B lymphocyte antigen 

10 function in vivo on the development of that disease. 

Blocking antigen function may also be therapeutically 
useful for treating autoimmune diseases. Many autoimmune 
disorders are the result of inappropriate activation of T cells 
that are reactive against self tissue and which promote fee 

IS production of cytokines and autoantibodies involved in fee 
pafeology of the diseases. Preventing fee activation of 
autoreactive T cells may reduce or eliminate disease syn^)- 
toms. Administration of reagents whidi block costimulation 
of T cells by disn^ting receptorJigand interactions of B 

20 lymphocyte antigens can be used to inhibit T cell activation 
and prevent production of autoantibodies or T cell-derived 
cytokines which may be involved in fee disease process. 
Additionally, blocking reagents may induce antigen-spedfic 
tolerance of autoreac^ve T cells vibich could lead to long- 

25 term relief firom fee disease. The efficacy of Mocking 
reagents in preventing or alleviating autoimmune disorders 
can be determined using a number of well-characterized 
animal models of human autoimmune diseases. Exan^les 
include murine experimental autornuDune encephalitis, sys- 

30 temic lupus erythmatosis in MRL/lpr/lpr mice or NZB 
hybrid mice, murine autoimmune collagen arthritis, diabetes 
mdlitus in NOD mice and BB rats, and murine experimental 
myasfeenia gravis (see Paul ed.. Fundamental Inununology, 
Raven Press, New York, 1989, pp. 840-856). 

35 Upregulation of an antigen function (preferably a B 
lymphocyte antigen function), as a means of up regulating 
immune responses, may also be useful in feerapy. Upregu- 
lation of immune responses may be in fee form of enhancing 
an existing immune response or eliciting an initial imnmne 

40 response. For example, enhancing an imnume response 
through stimulating B lymphocyte antigen function may be 
useful in cases of viral infection. In addition, systemic viral 
diseases such as influenza, fee common cold, and encepha- 
litis might be alleviated by fee administration of stimulatory 

45 fomos of B lyn^hocyte antigens systemically. 

Alternatively, anti-vital immune responses may be 
enhanced in an infected patient by removing T cells from fee 
patient, costimulating fee T cells in vitro wife viral antigen- 
pulsed APCs eifeer expressing a pq)tide of fee present 

50 invention or togefeer wife a stimulatory form of a soluble 
peptide of fee present invention and reintroducing fee in 
vitro activated T cells into fee patient Anofeer mefeod of 
enhancing anti-viral immune responses would be to isolate 
infected cells from a patient, transfect feem wife a nucleic 

55 add encoding a protein of fee present invention as described 
herein sudi that fee cells express aU or a portion of fee 
protein on feeir surface, and reintroduce fee transf ected cells 
into fee patient The infected cells would now be capable of 
delivering a costinuilatory signal to, and feereby activate, T 

60 cells in vivo. 

In anofeer application, up regulation or enhancement of 
antigen function (preferably B lymphocyte antigen function) 
may be useful in fee induction of tumor immunity. Tumor 
cells (e.g., sarcoma, melanoma, lymphoma, leukemia, 

65 neuroblastoma, carcinoma) transfected wife a nucleic add 
encoding at least one peptide of fee present invention can be 
administered to a subject to overcome tumor-specific toler- 
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ance in the subject If desired, the tumor cell cao be Mixed lymphocyte reaction (MLR) assays (which will 
transfected to express a combination of peptides. For identify, among others, proteins that generate predominantly 

example, tumor cells obtained from a patient can be trans- Thl and CTL responses) include, without limitation, those 
fected ex vivo with an expression vector directing the described in: Cuiient Protocols in Immunology, Ed by J. R 

expression of a peptide having B7-2-like activity alone, or in 5 Coligan. A. M. Kniisbeek, D. H. Margulies, E. M. Shevach. 
conjunction with a peptide having B7-l-like activity and/or W. Strober. Pub. Greene Publishing Associates and Wiley- 
B7-3-like activity. The transfected tumor cells are returned Intersdence (Chapter 3, In Vitro assays for Mouse Lympho- 

to the patient to result in expression of the peptides on the cyte Function 3.1-3.19; Chapter 7, Immunologic studies in 

surface of the transfected cell. Alto-natively, gene therapy Humans); Takm et aL, J. Immunol 137:3494-3500, 1986; 

techniques can be used to target a tumor cell for transfection lo Takai et aL, J. LnmunoL 140:508-512, 1988; Bertagnolli et 

in vivo. al., J. Immunol. 1493778-3783, 1992. 

The presence of the pq)tide of the present invention Dendritic cell-dependent assays (which will identify, 

having the activity of a B lymphocyte antigen(s) on the among others, proteins expressed by dendritic cells that 

surface of the tumor cell provides the necessary costimula- activate naive T-cells) include, without linutation, those 

tion signal to T cells to induce a T cell mediated immune is described in: Guery et aL, J. ImmunoL 134:536-544, 1995; 

response against the transfected tumor ceUs. In addition, Inaba et ah, Journal of Experimental Medicine 

tumor cells which lack MHC dass I or MHC dass n 173:549-559, 1991; Macatonia et al.. Journal of Immunol- 

molecules, or which fail to reexpress sufSdent mounts of ogy 154:5071-5079, 1995; Poigador et al.. Journal of 

MHC class I or MHC class n molecules, can be transfected Experimental Medidne 182:255-260, 1995; Nair et al., 

with nucleic acid encoding all ot a portion of (c.g., a 20 Journal of Virology 67:4062-4069, 1993; Huang et al., 

cytoplasmic-domain truncated portion) of an MHC class I a Science 264:961-965, 1994; Macatonia et aL, Journal of 

chain protein and P2 microglobulin protein or an MHC class Experimental Medicine 169:1255-1264, 1989; Bhardwaj et 

n a chain protdn and an MHC class II P chain protein to al. . Journal of Clinical Investigation 94:797-807, 1994; and 

thereby express MHC dass I or MHC class n proteins on the Inaba et al., Journal of Experimental Medicine 

cell surface. Expression of the appropriate class I or class n 25 172:631-640, 1990. 

MHC in conjunction with a p^tide having the activity of a Assays for lymphocyte survival/apoptosis (which will 

B lymphocyte antigen (e.g., B7-1, B7-2, B7-3) induces a T identify, among others, proteins that prevent apoptosis after 

cell mediated immune response against the transfected superantigen induction and proteins that regulate lympho- 

tumor celL Optionally, a gene encoding an antisense con- cyte homeostasis) include, without limitation, those 

struct which blocks expression of an MHC class O associ- 30 described in: Daizynidewicz et al., Cytometry 13:795-^08, 

ated protein, such as the invariant chain, can also be cotrans- 1992; Gorczyca et al., Leukemia 7:659-670, 1993; Gorc- 

fected with a DNA encoding a peptide having the activity of zyca et al.. Cancer Research 53 : 1 945-195 1 , 1 993 ; Itoh et al, , 

a B lymphocyte antigen to promote presentation of tumor (ill 66:233-243, 1991; Zacharchuk, Journal of Inmiunol- 

associated antigens and induce tumor specific imnuinity. ogy 145:4037-4045, 1990; Zamai et al.. Cytometry 

Thus, the induction of a T cell mediated immune response in 35 14:891-897, 1993; Gorczyca et al.. International Journal of 

a hunm subject may be sufBdent to overcome tumor- Oncology 1:639-648, 1992. 

specific tolerance in the subject. Assays for proteins that influence eariy steps of T-cell 

The activity of a protein of the invention may, among commitment and devdopment indude, without limitation, 

other means, be measured by the following methods: those described in: Antica et al.. Blood 84:111-117, 1994; 

Suitable assays for thymocyte or splenocyte cytotoxidty 40 Fine et al.. Cellular Immunology 155:111-122, 1994; Galy 

indude, without limitation, those described in: Current et aL, Blood 85:2770-2778, 1995; Told et aL, Proc. Nat 

Protocols in Immunology, Ed by J. E. Coligan, A. M. Acad Sd. USA 88:7548-7551, 1991. 

Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Hematopoiesis Regulating Activity 

Pub. Greene Publishing Assodates and Wiley-Intersdence A protein of die present invention may be useful in 

(Chapter 3, In Vitro assays for Mouse Lymphocyte Function 45 regulation of hematopoiesis and, consequently, in the treat- 

3.1-3.19; Chapter 7, Immunologic studies in Humans); ment of myeloid or lymphoid cell dcfidendes. Even mar- 

Hernnann et al., Proc. NatL Acad. Sd. USA 78:2488-2492, ginal biological activity in support of colony forming cells 

1981; Herrmann et al., J. Immunol. 128:1968-1974, 1982; or of factor-dependent cell lines indicates involvement in 

Handaetal., I ImmunoL 135:1564-1572, 1985; Takai etal., regulating hematopoiesis, e.g. in supporting the growth and 

L Inmiunol. D73494-3500, 1986; Takai et al., J. ImmunoL 50 proliferation of oythroid progenitor cells alone or in com- 

140:508-512, 1988; Herrmann et al., Proc. NatL Acad. Sd. bination with other cytokines, thereby indicating utiHty, for 

USA 78:2488-2492, 1981; Herrmann et aL, J. InmumoL example, in treating various anemias or for use in conjunc- 

128:1968-1974, 1982; Handa et al., J. Immunol. tion with irradiation/chemotherapy to stimulate the produc- 

135:1564-1572, 1985; Takai et al., J. Inmiunol. tion of eiythroid precursors and/or erythroid cells; in sup- 

137:3494-3500, 1986; Bowmanet aL, J. Virology 55 porting the growth and proliferation of myeloid cells such as 

61:1992-1998;Takaietal., J.InamunoL 140:508-512, 1988; granulocytes and monocytesAnaaophages (i.e., traditional 

BertagnollietaL.CellularDnmunology 133:327-341, 1991; CSF activity) useful, for exan^le, in conjunction with 

Brown et aL, J. ImmunoL 153:3079-3092, 1994. chemotherapy to prevent or treat consequent myelo- 

Assays for T-cell-dependcnt inmiunoglobulin responses suppression; in supporting the growth and prolif nation of 

and isotype switching (which will identify, among others, 60 megakaryocytes and consequentiy of platelets thereby 

proteins that modulate T-cell dependent antibody responses allowing prevention or treatment of various platelet disor- 

and that affect Thl/rh2 profiles) indude, without limitation, ders sudi as thrombocytopenia, and geneiaUy for use in 

those described in: Maliszewski, J. Dnmunol. placeof or complimentary to platd^ transfusions; and/or in 

1443028-3033, 1990; and Assays for B cell function: In supporting the growth and proliferation of hematopoietic 

vitro antibody production, Mond, J. J. and Brunswick, M. In 65 stem cells which are capable of maturing to any and all of 

Current Protocols in Immunology, J. E. ca. Coligan eds. Vol the above-mentioned hematopoietic cells and therefore find 

1 pp. 3.8.1-3.8.16, John Wiley and Sons, Toronto. 1994. ther^utic utility in various stem cell disorders (such as 
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those usually treated with transplantation, including, witiiout (coUagenase activity, osteoclast ac^vity, etc) mediated by 

liioitation* aplastic anemia and paroxysmal nocturnal inflammatory processes. 

hemoglobinuria), as well as in repopulating the stem cell Ancrther category of tissue regenCTation activity th^ may 

compartment post irradiation/chemotherapy, either in-vivo ^ attributable to the protein of the present invention is 

or ex-vivo(Le„ in conjunction with bone manowtransplan- 5 tendon/ligament formation. A protein of the present 

tation or witii peripheral progenitor ceU transplantation invention, whidi induces tendon/ligament-lite tissue or 

(hmnologous or heterologous)) as normal cells or geneti- other tissue formation in drcumstant^ where s^ 

cally m^pulated for gene therapy. '^^^ °*™^y ^ apphcation m the healmg of 

The acti^ of a protein of fte invention may, among f i^lT''- "^"^^ ^""^""l 

. ; ^ L J. „ . \ beament defects in himians and other animals. Such a 

other means, be measured by the foUowing methods: lO rf^rr*J„„ K^lXc t/in-^.^ 

- . , , ^ J ^. c preparation employing a tendon/ligament-liKe tissue mouo- 

Suitable assays for proliferation and differentiation of l^g^proteinrn^haveirophykcticEprev^^ 

vanoushanatoiwictic hues are c^ to tendon or ligament tissue, as well as use in the improved 

Assays for embryomc stem cell differentiation (which wfll fixation of tendon or ligament to bone or otiier tissues, and 

identify, among otiicrs, jffoteins that influence embryonic £n repairing defects to tendon or ligament tissue. De novo 

differentiation hematopoiesis) include, without limitation, 13 tendon/ligament-like tissue fonnation induced by a conqx>- 

those described in: Johansson et al. Cellular Biology sition of the present invention contributes to the repair of 

15:141-151, 1995; Keller et al.. Molecular and CeUular congenital, trauma induced, or other tendon or ligament 

Biology 13:473-486, 1993; McClanahan et al., Blood defects of other origin, and is also useful in cosmetic plastic 

8 1 :2903-2915 , 1993. surgery for attachment or repair of tendons or ligaments. The 

Assays for stem cell survival and differentiation (which 20 compositions of the present invention may provide environ- 
will identify, among others, proteins that regulate lynq)ho- ment to attract tendon- or ligament-forming cells, stimulate 
hematopoiesis) include, without limitation, those described growth of tendon- or ligament-fonning cells, induce differ- 
in: Metiiylcellulose colony forming assays, Freshney, M. G. entiation of progenitors of tendon- or ligament-fonning 
In Culture of Hematopoietic Cells. R. L Freshney, et al. eds. cells, or induce growth of tendon/ligament cells or progeni- 
Vol pp. 265-268, Wiley-Uss, Inc., New York, N.Y. 1994; 25 tors ex vivo for return in vivo to effect tissue repair. The 
Hirayama et al., Proc. NatL Acad. ScL USA 89:5907-5911, compositions of the invention may also be useful in the 
1992; Primitive hematopoietic colony forming ceUs with treatment of tendinitis, carpal tunnel syndrome and other 
high proliferative potential, McNiece, L K. and Briddell, R. tendon or ligament defects. The compositions may also 
A. In Culture of Hematopoietic CeUs, R. L Freshney, et al. indude an appropriate matrix and/or sequestering agent as a 
eds. Vol pp. 23-39, Wiley-Uss, Inc., New York, N.Y. 1994; 30 carrier as is wdl known in the art 
Neben et al.. Experimental Hematology 22:353-359, 1994; The protein of the present invention may also be useful for 
Cobblestone area forming cell assay, Ploemadier, R. E. In prohferation of neural cells and for regeneration of nerve 
Culture of Hematopoietic Cells, R. L Freshney, et aL eds. Vol and brain tissue, i.e, for the treatment of central and periph- 
pp. 1-2 l,Wlley-Liss, Inc., New York, N.Y. 1994; Long term eral nervous system diseases and neuropathies, as well as 
bone marrow cultiu'es in the presence of stromal cells, 35 mechanical and traumatic disorders, which involve 
Spooncer, E. , Dexter, M. and Allen, T. In Culture ofHemato- degeneration, death or trauma to neural cells or nerve tissue. 
poietic Cells. R. L Freshney, et aL eds. Vol pp. 163-179, More specifically, a protein may be used in the treatment of 
Wiley-Liss, Inc., New York, N.Y. 1994; Long term culture diseases of the perqjheral nervous systemi, such as perq)heral 
initiating cell assay, Sutherland, H. J. In Culture ofHemato- nerve injuries, peripheral neuropathy and localized 
poietic Cells, R. I Freshney, et aL eds. Vol pp. 139-162, 40 neuropathies, and central nervous system diseases, such as 
Wiley-Iiss, Inc., New York, N.Y. 1994. Alzheimer's, Parkinson's disease, Huntington's disease. 
Tissue Growth Activity amyotrophic lateral sclerosis, and Shy-Drager syndrome. 

A protein of the present invention also may have utility in Further conditions which may be treated in accordance with 

compositions used for bone, cartilage, tendon, ligament the present invention include mechanical and traumatic 

and/or nerve tissue growth or regeneration, as well as for 45 disorders, such as spinal cord disorders, head trauma and 

wound healing and tissue repair and replacement, and in the cerebrovascular diseases such as stroke. Poipheral neuro- 

treatment of bums, incisions and ulcers. pathies resulting from diemotherapy or other medical thsa- 

A protein of the present invention, which induces card- pies may also be treatable using a protein of the inventioiL 

lage and/or bone growth in circumstances where bone is not Proteins of the invention may also be usefiil to promote 

normally formed, has ^plication in the healing of bone 50 better or faster closure of non-heaMng wounds, inchiding 

fractures and cartilage damage or defects in humans and without limitation pressure ulcers, ulcers associated with 

other animals. Such a preparation employing a protein of the vascular insufEciency, surgical and traumatic wounds, and 

invention may have prophylactic use in closed as well as the like. 

open fracture reduction and also in the improved fixation of It is expected that a protein d the present invention may 

artificial joints. De novo bone fonnation induced by an 55 also exhibit activity for generation or regeneration of other 

osteogenic agent contributes to the Tcpak of congenital, tissues, such as organs (including, for exanple, pancreas, 

trauma induced, or oncologic resection induced craniofacial liver, intestine, kidney, skin, endodielium), muscle (smooth, 

defects, and also is useful in cosmetic plastic surgery. skeletal or cardiac) and vascular (including vascular 

A protein of this invention may also be used in tiie endothelium) tissue, or for promoting the growth of cells 

treatment of periodontal disease, and in other tooth repair 60 comprising such tissues. Part of the desired effects may be 

{S'ocesses. Such agents may provide an environment to by inhibition or modulation of fibrotic scarring to allow 

attract bone-forming cells, stimulate growth of bone- nomial tissue to regenerate. A protein of the invention may 

forming cells or induce differentiation of progenitors of also exhibit angiogenic activity. 

bone-forming cells. A protein of the invention may also be A protein of tiie present invention may also be useful for 

useful in the treatment of osteoporosis or osteoarthritis, such 65 gut protection or regeneration and treatment of lung or liver 

as through stimulation of bone and/or cartilage repair or by fibrosis, reperfusion injury in various tissues, and conditions 

blocking inflammation or processes of tissue destruction resulting from systemic cytokine damage. 
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A protein of present invention may also be useful for paiticular protein has chemotactic activity for a population 

promoting or inhibiting differentiation of tissues described of cells can be readily determined by employing such 

above from precursor tissues or cells; or for inhibiting the protein or peptide in any known assay for cell chemotaxis. 

growth of tissues described above. The activity of a protdn of the invention may, among 

The activity of a protein of the invention may, among 5 other means, be measured by the following metiiods: 
other means, be measured by flie following methods: Assays for chemotactic activity (which will identify pro- 
Assays for tissue g^eration activity include, without teins that induce or prevent chemotaxis) consist of assays 
limitation, thost described in: International Patent Publica- that measure the ability of a protein to induce the migration 
tion No. WO95/16035 (bone, cartilage, tendon); Interna- of cells across a membrane as well as the ability of a protein 
tional Patent Publication No. WO95/05846 (nerve, lo to induce the adhesion of one cell population to another cell 
neuronal); International Patent Publication No. W091/ population. Suitable assays for movement and adhesion 
07491 (skin, endothelium). include, without limitation, those described in: Current 
Assays for wound healing activity include, without Protocols in Immunology, Ed by J. E. Coligan, A. M. 
limitation* those described in: Winter, Epidermal Wound Kruisbeek, D. H. Marguiles, E. M. Shevach, W. Strober, 
Healing, pps. 71-112 (Maibach, H. L and Rovee, D. T., 15 Pub. Greene Publishing Associates and Wiley-Interscience 
eds.), Year Book Medical Publishers, Inc., Chicago, as (Chapter 6.12, Measurement of alpha and beta Chemokines 
modified by Eaglstein and Mertz, J. Invest. Dermatol 6.12.1-^.12.28; Taub et aL J. Clin. Invest. 95:1370-1376, 
71:382-84 (1978). 1995; Und et al. APMIS 103:140-146, 1995; MuUer et al 
Activin/Inhibin Activity Eur. J. Inamunol. 25 : 1744-1748; Gruber et al. J. of Immunol. 

A protein of the present invention may also exhibit 20 152:5860-5867, 1994; Johnston et al. J. of Inmmnol. 

activin- or inhibin-related activities. Inhibins are character- 153:1762-1768, 1994. 

ized by their ability to inhibit the release of follicle stimu- Hemostatic and Thrombolytic Activity 

lating hormone (FSH), while activins and are characterized A protein of the invention may also exhibit hemostatic or 

by their ability to stimulate the release of follicle stimulating thrombolytic activity. As a result, such a protein is expected 

hormone (FSH). Thus, a protein of the present invention, 25 to be usdful in treatment of various coagulation disorders 

alone or in heterodimers with a member of the inhibin a (including hereditary disorders, such as hemophilias) or to 

family, may be useful as a contraceptive based on the ability enhance coagulation and other hemostatic events in treating 

of inhibins to decrease fertility in female mammals and wounds resulting from trauma, surgery or other causes. A 

decrease spermatogenesis in male maimnals. Adn[iinistration protein of the Invention may also be useful for dissolving or 

of sufScient amounts of other inhibins can induce infertility 30 inhibiting formation of thromboses and for treatment and 

in these mammals. Alternatively, the protein of the prevention of conditions resulting therefrom (such as, for 

invention, as a homodimer or as a heterodimer with other example, infan^on of cardiac and central nervous system 

protein subunits of the inhibin-p group, may be useful as a vessels (e.g., stroke). 

fertility inducing therapeutic, based upon the ability of The activity of a protein of the invention may, among 

activin molecules in stimulating FSH release from cells of 35 other means, be measured by the following methods: 

the anterior pituitary. See, for cxanq^le, U.S. Pat No. 4,798, Assay for hemostatic and thrombolytic activity include, 

885. A protein of the invention may also be useful for without limitation, those described in: Linet et aL, J. din. 

advancement of the onset of fertility in sexually inunature Pharmacol. 26:131-140, 1986; Burdick et al., Thrombosis 

mahmials, so as to increase the lifetime reproductive per- Res. 45:413-419, 1987; Humphrey et al., Fibrinolysis 

fonnancc of domestic animals such as cows, sheep and pigs. 40 5:71-79 (1991); Schaub, Prostaglandins 35:467-474, 1988. 

The activity of a protein of the invention may, among Receptor/Ugand Activity 

other means, be measured by the following methods: A protein of the present invention may also demonstrate 

Assays for activin/inhibin activity include, without activity as recq)tOTS, receptor ligands or inhibitors or ago- 

limitation, those described in: Vale et aL, Endocrinology nists of receptor/ligand interactions. Examples of such 

91:562-572, 1972; ling et al., Nature 321:779-782, 1986; 45 receptors and ligands include, without limitation, cytokine 

Vale etal.. Nature 321:776-779, 1986; Mason etal.. Nature receptors and their ligands, receptor kinases and their 

318:659-663, 1985; Forage et al., Proc, Natl. Acad. Sd. ligands, receptor phosphatases and their ligands, receptors 

USA 83 J09 1-3095, 1986. involved in cell-cell interactions and their ligands (including 

Chemotactic/Chemokinetic Activity without limitation, cellular adhesion molecules (such as 

A protein of the present invention may have chemotactic 50 seledins, integrins and their ligands) and receptor/ligand 

or chemokinetic activity (e.g., act as a chemokine) for pairs involved in antigen presentation, antigen recognition 

mammalian cells, including, for example, monocytes, and development of cellular and humoral immune 

fibroblasts, neutrophils, T-ceUs, mast cells, eosinophils, epi- responses). Recq)tors and ligands are also useful for screen- 

thelial and/or endothelial cells. Chemota(^c and chemoki- ing of potential peptide or small molecule inhibitors of the 

netic proteins can be used to mobilize or attract a desired cell 55 relevant reccptoi/ligand interaction. A protein of the present 

population to a desired site of action. Chemotactic or chemo- invention (including, without limitation, fragments of recep- 

kinetic proteins provide particular advantages in treatment tors and ligands) may themselves be useful as inhibitors of 

of wounds and other trauma to tissues, as well as in receptoi/ligand interactions. 

treatment of localized infectious. For example, attraction of The activity of a protein of the invention may, among 

lyn^hocytes, monocytes or neutrophils to tumors or sites of 60 other means, be measured by the following methods: 

infection may result in improved iimnune responses against Suitable assays for receptor-ligand activity include wilh- 

the tumor or infecting agent out limitation those described in;Current Protocols in 

A protein or peptide has chemotactic activity for a par- Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. H. 

ticular ceD population if it can stimulate, direcdy or Margulies, E. M. Shevadi, W. Strober, Pub. Greene Pub- 

indiiectiy. the directed orientation or movement of such cell 65 lishing Associates and Wiley-Intendence (Ch^ter 72Z, 

population. Preferably, the protein or peptide has the ability Measurement of Cellular Adhesion under static conditions 

to direcdy stimulate directed movement of cells. Whether a 7.28.1-7.28.22), Takai et aL, Proc. Nati. Acad. ScL USA 
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84:6864-6868, 1987; Bierer et al., J. Exp. Med. as an antigen in a vaccme con^ositton to raise an immune 

168:1145-1156, 1988; Rosenstein et al., J. Exp. Med response against sudi protein or another material or entity 

169:149-160 1989; Stoltenboig et al., J. ^mnunol. Methods whidi is soss^eactive with such protein. 

175-39-68, 1994; Stitt et aL, CeU 80:661-670. 1995. 

Anti-Inflammatofry Activity 5 ADMINISTRAnON AND DOSING 

Proteins of the present invention may also exhibU anti- ^ ^^ ^ invoition (ftom whatever source 

inflammatory activity. The anti-inflammatory activity may i^luding without limitation from recombinant and 

be achieved by providinp stimutas to cells mvolved in the no„.,ecomhinant sources) may be used in a pharmaceutical 

inflammatory response, by mhflDitmg or promoto^^ composition when combined with a pharmaceutically 

mteractions (sudi as, for example, ceU adhesion), by inhft- lo ^ .^^^ guch a composition may also contain (in 

itmg or promotmg chemotaxis of cdls mvolved m flie ^^^j^^ ^ ^ salts, 

inflammatory process, inhibiting or promoUng ceU ^^^^ stabflizers, soIubiJizeis, and other materials well 

extravasation, or by stminlatmg or sn)pressmg production j^^^ in the ait The tenn 'Wiannaceutically acaajtable" 

of odier factors whidi more directly inhibit or promote an ^ non-toxic material that does not interfeie with the 

mflammatory response, ftotems exhibitmg such achvihes is gfl^ectiveness of the biological activity of the active 

can be used to treat inflammatory conctoons including i„gjedient(s). Tlie characteristics of the carrier will depend 

chrome or acute condmons) including without limiiauon ^^^^ administration. The pharmaceutical compo- 

mtimauon associated with infectoon (such as sq>Uc sho* ^jjj^^ invention may also contain cytokines, 

sepsis or systemic inflammatory response syndrome (^) jy^holdnes, or otha hematopoietic factors such as M-CSF, 

). ischemiarreperfiision mjuiy. endotoxm lefliality, arthritis, 20 gm-CSF,TNF.IL-1.1L-2,IL-3,IL-4.IL-5,1L-6,1L-7,IL-8, 

complement-mediated hyperacute rejection, nephritis, il-9, ino, IL-H, IL-12, IL-13. IH4, IH5, IFN. TNFO, 

cytokine or chemokine-induced lung injury inflammatoiy ^^p^ Meg-CSF, thrombopoietin, stem ceU 

bowel disease. &ohn's <Usease or resultmg from over pro- erythropoietin. The pharmaceutical composition 

duction of cytokines such as TNF or IH. Proteins of the ^^^^j^^ ^^j^ ^„ 

mvenuon may also be useful to treat anaphylaxis and 25 ^ ^ compliment its activity or use in 

hypersensitivity to an antigenic substance or material. tttitmtnL Such additional factors and/or agents may be 

T^imor Inhibition Activity included in tiie pharmaceutical composition to produce a 

In addition to the activities desmbed above for unmuno- ^y^g^^ effect with protein of the invention, or to mini- 

ogical treatinent or prevention of tamors. a protein of the ^ ^ Conversely, FOtein of the present inven- 

mvenbon may exhibit ofta anti-timior a^^^ 30 ^.^^ ^ ^^^^^ ^ formulations of the particular 

may mhibit tanior ^o^yth directly or mdmrtly (such as, for lymphokine, other hematopoietic factor, throm- 

exa^e, via ADCC). A protem may exhibit its Ounor bolytic or anti-thrombotic factor, or anti-inflammatory agent 

mhibitoiy activity by acting on tumor tissue or tumor ^^^^^ ^^j^^^ lymphokine, othsx 

precursor tissue, by mhibiting formation of tissues necessary hemopoietic factor, thrombolytic or anti-thrombotic factor, 

to support tumor growth (such as, for example, by inhibiting 3S ^ anti.ioflan]matory agent. 

angiogenesis).by causing production of other factors, agents . ^ . ^ ■ 

or ceU types which inhibit tumor growth, or by suppressing, ^.P'^^f ^^^^^ PJ^^^^^^ invcnUon may be active m 

eliminating or inhibiting factors, agents or ceU types which multimers (e.g., heterodimers or homoduncrs) or comptoes 

promote tmnor growth with itself or other protems. As a result, pharmaceuUcal 

Other Activities 40 conq)ositions of the invention may con^>rise a protein of the 

A protein of the invention may also exhibit one or more ^^^^^^^^ ^ mnltimeric or completed form, 

of the following additional activities or effects : inhibiting the The pharmaceutical composition of the invention may be 

grosvth.infertion or function of, or killing, infe(^ousagcnts» ^ *he form of a complex of the protein(s) of present 

including, without limitation, bacteria, viruses, fungi and invention along with protein or peptide antigens. The protein 

other parasites; effecting (suppressing or enhancing) bodily 45 and/or peptide antigen will deliver a stimulatcoy signal to 

diaracteristics, including, without limitation, height, weight, Iwth B and T lymphocytes. B lymphocytes will respond to 

hair color, eye color, skin, fat to lean ratio or other tissue antigen through their surface immunoglobulin receptor. T 

pigmentation, or organ ot body part size or shape (such as, lymphocytes will respond to antigen through the T cell 

for example, breast augmentation or diminution, change in receptor (TCR) following presentation of the antigen by 

bone fonn or shape); effecting biorhythms or caricadic so MHC proteins. MHC and structurally related proteins 

cycles or rhythms; effecting the fertility of male or female including those encoded by dass I and dass n MHC genes 

subjects; effecting the metabolism, catabolism, anabolism, on host cells will serve to present the peptide antigen(s) to 

processing, utilization, storage or elimination of dietary fat, T lynq)hocytes. The antigen components could also be 

lipid, protein, carbohydrate, vitamins, minerals, cofactors or supplied as purified MHC-peptide complexes alone or with 

other nutritional factors or component(s); effecting b^av- 55 co-stimulatory molecules that can directly signal T cells, 

ioral chaiactenstics, including, without limitation, appetite, Alternatively antibodies able to bind surface immunoglobu- 

libido. stress, cognition (including cognitive disorders), lin and other moleailes on B cells as well as antibodies able 

depression (including depressive disorders) and violent to the TCR and other molecules on T cells can be 

behaviors; providing analgesic effects or other pain reducing combined with the pharmaceutical conqiosition of the inven- 

effects; promoting differentiation and growth of embryonic 60 ^on. 

stem cells in lineages other than hematopoietic lineages; The pharmaceutical composition of the invention may be 

hormonal or endocrine activity; in the case of enzymes, in the form of a Kposome in which protein of the present 

correcting deficiencies of the enzyme and treating invention is combined, in addition to other pharmaceuticaUy 

deficiency-related diseases; treatment of hypecproliferative acceptable carders, with aiiq)hipathic agents such as UpHs 

disorders (such as, for example, psoriasis); 65 whidi exist in aggregated form as micelles, insoluble 

immunoglobulin-like activity (such as, for example, the monolayers, liquid crystals, or lamellar layers in aqueous 

ability to bind antigens or complement); and the abiLUy to act solution. Suitable lipids for liposomal formulation include, 
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without limitatioiu monoglycerides, diglycerides, sulfaddes, or subcutaneous injection, protein of the present invention 

lysoledthin. phospholipids, saponin, bile adds, and the like. will be in the form of a pyrpgen-free, parenterally acceptable 

Preparation of such liposomal formulations is within the aqueous solution. The preparation of such parenterally 

level of skill in the art as disclosed* for example, in U.S. Pat acceptable protein solutions, having due regard to pH, 

Nos. 4^5,871; 4.50L728; 4,837,028; and 4,737323, all of 5 isotonidty, stability, and the like, is within the skill in the art 

which are incorporated herein by reference. ^ prefored phannaceutical conqwsition for intravenous, 

. J u • *u * «*u 11 <p *• cutaneous, or subcutaneous injection should contain, in 

AS used herein the tenn "therapeuUcaUy effective ^^^^^ ^ ^^^^ ^^^^^.^ ^ ^^^^^ 

amounr means the total amount of each active conq^^^ ^^^^^^ ^^^^ ^ ^j^l^^ j^^^^^^ 

of the pharmaceuucal composition or method that is suffi- injection. Dextrose Injection, Dextrose and Sodium Chlo- 

aent to show a meamngftil patient benefit, i.e., treatment, lo ^de Injection, Lactaled Ringer's Injection, or other vehicle 

healing, prevention or amelioration of the relevant medical as known in the art The pharmac«itical composition of the 

condition, or an increase in rate of treatment, healing, presentinvention may also contain stabilizers, preservatives, 

prevention or amelioration of such conditions. When applied buffers, antioxidants, or other additives known to those of 

to an individual active ingredient administered alone, the skill in the art 

term refers to that ingredient alone. When applied to a 15 -j^^ amount of protein of the present invention in the 

combination, the term refers to combined amounts of the pharmaceutical composition of the present invention will 

active ingredients that result in the therapeutic effect, depend upon tiic nature and severity of the condition being 

whether administered in combination, serially or simulta- treated, and on the nature of prior treatments which the 

neously. patient has undergone. Ultimately, the attending physician 

In practicing tiie method of treatment or use of the present ^° will decide the amount of protein of die present invention 

invention, a Uierapeutically effective amount of protein of ^^^^ ^ individual patient InitiaUy, the 

the present invention is administered to a mammal having a f^^'^^S physician wiU adirunister low doses of protein of 

conditiontobetreated.Proteinof thepresentinventionnuy P^^^^°^ and observe tiie paUent s response, 

be administered in accordance witiTthe method of the ''l-f^K^"'/ the present mvenhon may be 

^. 1 . ^ ^ .25 administered untfl the optimal ther^eutic effect is obtained 

mventioneitheraloneorincombmationwithotii^&^^^^ ^ for die patient and at that point tiie dosage is not increased 

such as treatments employmg cytokmes, lymphokines or jj contemplated that the various pharmaceutical 

otiierhematopoieticfactors.Whenco-administeredwithone compositions used to practice the method of die present 

or more cytokmes, lymphokines or other hematopoietic invention should contain about 0.01 fig to about 100 mg 

factors, protein oftiie present invention may be administered (preferably about 0.1 ug to about 10 mg, more preferably 

dtiier ^ultaneously mth tiie Qrtokine(s), lymphokine(s), ^bout 0.1 fig to about 1 mg) of protein of the present 

otiier hematopoietic factor(s), thrombolytic or anti- invention per kg body weight 

tiirombotic factors, or sequentially. If administered ^^^^ intravenous tiierapy using the phaima- 

sequentiaUy. die attending physiaan wtil decide on the ^^^^ composition of die present invention wHl vary, 

appropriate sequence of admmist^ngpn>tein of the^esent ^^^^ ^^^^ ^^^^ ^^^^ 

mvention m combination widi ^okme(s), ymphokine(s , the condition and potential idiosyncratic response of each 

tfJombotiSr thrombolyuc or anti- i^^ividual patient It is contemplated tiiat the duration of 

each application of the protein of the present invention will 

Administration of protein of tiie present invention used in be in die range of 12 to 24 hours of continuous intravenous 

the pharmaceutical composition or to practice die mrthod of ^ administration. Ultimately the attending physician will 

the present invention can be carried out in a variety of decide on tiie appropriate duration of intravenous tiierapy 

conventional ways, sudi as oral ingestion, inhalation, topical using die phannaceutical composition of the present inven- 

application or cutaneous, subcutaneous, intraperitoneal, (Jqq^ 

parenteral or intravenous injection. Intravenous administra- p^^tein of tiie invention may also be used to immunize 

tion to the patient is preferred. ^^^^ polyclonal and monoclonal antibodies 

When a therapeutically effective amount of protein of die which specifically react widi die protein. Such antibodies 

present invention is administered orally, protein of the may be obtained using either the entire protein or fragments 

present invention will be in die form of a tablet c^sule, thereof as an immunogen. The peptide immunogens addi- 

powder, solution or elixir. When administered in tablet form, tionally may contain a cysteine residue at the carboxyl 

the pharmaceutical composition of the invention may addi- 50 terminus, and are conjugated to a hapten such as keyhole 

tionally contain a solid carrier such as a gelatin or an lin^t hemocyanin (KLH). Methods for synthesizing such 

adjuvant The tablet, capsule, and powder contain firom peptides are known in the ait, for example, as in R. P. 

about 5 to 95% protein of tiie present invention, and pref- Merrificld, J. Amer, Chem. Soc. 85, 2149-2154 (1963); J. L. 

erably from about 25 to 90% protein of die present inven- Krstenansky, et al., FEES Lett 211, 10 (1987). Monoclonal 

tion. When administered in liquid form, a liquid carrier sudi 55 antibodies binding to the protein of die invention may be 

as water, petroleum, oils of animal or plant origin such as usefid diagnostic agents for the immunodetection of the 

peanut oil, mineral oil, soybean oil, or sesame oil, or protein. Neutralizing monoclonal antibodies binding to the 

synthetic oils may be added. The liquid form of die phar- protein may also be useful dier^eutics for botii conditions 

maceutical con^sition may further contain physiological associated with the protein and also in die treatment of some 

saline solution, dextrose or other saccharide solution, ot ^ forms of cancer where abnormal expression of the protein is 

glycols such as etiiylene glycol, propylene glycol or poly- involved. In die case of cancerous cells or leukemic cells, 

ethylene glycol When administered in liquid form, die neutralizing monoclonal antibodies against the protein may 

pharmaceutical composition contains firom about 0 J to 90% be useful in detecting and preventing die metastatic spread 

by weightofproteinof diepresentinvention,andpreferably of die cancerous cells, which may be mediated by die 

from about 1 to 50% protein of the present invention. ^5 protdn. 

When a therapeutically effective amount of protein of the For compositions of the present invention which are 

present invention is administered by intravenous, cutaneous useful for bone, cartilage, tendon or ligament regeneration. 
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the theiapeutic method includes administeiiiig the conqx)- 
sition topicaily, systematically, or locally as an implant or 
device. When administeFed, the therapoitic conqwsition for 
use in this invention is, of course, in a pyrogen-free* physi- 
ologically acceptable fonn. Further, the composition may 
desirably be enc^>sulated or injected in a viscous form for 
delivery to the site of bone, cartilage or tissue damage. 
Topical administration may be suitable for wound healing 
and tissue rq>air. Therapeutically useful agents other than a 
I^otein of the invention which may also optionally be 
included in the composition as described above, may alter- 
natively or additionally, be administered simultaneously or 
sequentially with the composition in the methods of the 
invention. Preferably for bone and/or cartilage fonnation, 
the conq)osition would include a matrix capable of deliver- 
ing the protein-containing composition to the site of bone 
and/or cartilage damage, providing a structure for the devel- 
oping bone and cartilage and optimally capable of being 
resorbed into the body. Such matrices may be formed of 
materials presently in use for other implanted medical 
applications. 

The choice of matrix material is based on 
biocompatibility, biodegradability, mechanical properties, 
cosmetic appearance and interface properties. The particular 
application of the compositions will define the appropriate 
formulation. Potential matdces for the compositions may be 
biodegradable and chemically defined calcium sulfate, 
tricalciunc^hosphate, hydroxy^atite, polylactic acid, polyg- 
lycolic add and polyanhydrides. Other potential materials 
are biodegradable and biologically well-defined, such as 
bone or dermal collagen. Fimher matrices are comprised of 
pure proteins or extracellular matrix conq>onents. Odier 
potential matrices are nonbiodegradable and chemically 
defined, such as sintered hydroxy apatite, bioglass, 
aluminates, or other ceramics. Matrices may be con^rised 
of combinations of any of the above mentioned types of 
material, such as polylactic add and hydroxyapatite or 
collagen and tricaldumphosphate. The bioceramics may be 
altered in composition, such as in caldum-aluminate- 
phosphate and processing to alter pore size, partide size, 
partide shape, and biodegradability. 

Presently preferred is a 50:50 (mole weight) copolymer of 
ladic acid and glycoHc add in the form of porous partides 
having diameters ranging firom 150 to 800 microns. In some 
applications, it will be useful to utilize a sequestering agent, 
sudi as carboxymethyl cellulose or autologous blood dot, to 
prevent the protein compositions from disassociating from 
the matrix. 

A prefexred family of sequestering agents is cdlulosic 
materials such as alkylcelluloses (including 
hydroxyalkylcelluloses), including methylcellulose, 
ethylcellulose, hydroxyethylcellulose, 
hydroxypropylcellulose, hydroxypropyl-methylcellulose, 
and carboxymethylcellulose, the most preferred being cat- 
ionic salts of carboxymethylcellulose (CMQ, Other pre- 
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fencd sequestering agents indude hyaluronic add, sodium 
alginate, poly(ethylene glycol), polyoxyethylene oxide, car- 
boxyvinyl polymer and poly(vinyl alcohol). The amount of 
sequestering agent useful herein is 0.5-20 wt %, preferably 

3 1-10 wt % based on total formulation wdght, whidi rq>- 
resents the amount necessary to prevent desorbtion of the 
protein from the polymer matrix and to provide appropriate 
handling of the con:q)osition, yet not so much that the 
progenitor cdls are prevented from infiltrating the matrix, 

10 thereby providing the protein the opportunity to assist die 
osteogenic activity of the progenitor cells. 

In further con^)ositions, proteins of die invention may be 
combined with other agents beneficial to the treatment of the 
bone and/or cartilage defect, wound, or tissue in question. 
These agents indude various growth factors such as epider- 
mal growth factor (EGF), platelet derived growdi factor 
(PDGF), transforming growth factors (TGF-a andTGF-P), 
and insulin-like growth factor (IGF). 

The therapeutic corq>ositions are also presently valuable 
for veterinary applications. Particulady domestic animals 
and thoroughbred horses, in addition to humans, are desired 
patients for such treatment with proteins of the present 
invention. 

25 The dosage regimen of a protein-containing pharmaceu- 
tical composition to be used in tissue regeneration will be 
detennined by die attending physician considering various 
factors which modify the action of the protdns, e.g., amount 
of tissue weight desired to be formed, &e site of damage, the 

30 condition of die damaged tissue, die size of a wound, type 
of damaged tissue (e.g., bone), the patient's age, sex, and 
diet, the severity of any infection, time of adminifitration and 
other clinical factors. The dosage may vary with the type of 
matrix used in the reconstitution and with indusion of other 

35 protdns in the pharmaceutical con^>osition. For example, 
the addition of other known growth factors, such as IGF I 
(insulin like growdi factor I), to the final con^osition, may 
also effect the dosage. Progress can be monitored by peri- 
odic assessment of tissue/bone growth and/or repair, for 

4Q exanq)le, X-rays, histomorphometric determinations and tet- 
racycline labeling. 

Polynudeotides of die present invention can also be used 
for gene therapy. Sudi polynucleotides can t>e introduced 
either in vivo or ex vivo into cdls for e^nession in a 

45 mammali an subject Polynudeotides of the invention may 
also be administered by other known methods for introduc- 
tion of nuddc add into a cell or organism (induding, 
without limitation, in the form of viral vectors naked 
DNA). 

^ Cells may also be oiltured ex vivo in the presence of 
proteins of die present invention in order to prolif ante or to 
produce a desired effect on or activity in such cells. IVeated 
cells can dien be introduced in vivo for therapeutic purposes. 

Patent and literature ref fences cited herdn are incorpo- 
rated by reference as if fully set forth. 



SEQUENCE LISTING 



( 1 ) GENERAL INFOBMAnON: 

( i i i ) NUMBER OF SEQUENCES: 11 

( 2 ) INFORMAnON FORSEQ n> NO:l: 
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-continued 



( i ) SEQUHKCE CHARACIBOSnCS: 
( A ) LENGTH: 433 base pair* 
( B ) TYPE: mwkic acid 
( C ) STRANDEDNESS: doabk 
{ D ) TOPOLOGY: litrffr 

( i i )MOUBCULETYPE:d3NA 

( z i ) SEQUENCE DESCRIPnON: SEQ ID KO:l: 

OOTTTGAAAA CTCTOCTTCC TTTGTGAATT TGGTGTTAGG AGTTCTTATT GTTATTCTGC 60 

AGCCTTTACT ATTOTCCTTT ATTTACTGAA CACAOTOAAT ACCAAGCACT OTTTATTAOA 120 

GGTTAGOAGT AGGGGCAGGT GATTAAAAAA ACAAAAAAGC TAATAATCTC CTCAAGCAAT 180 

TTCTGGCCTA ATAOAATTAT AOTAOACAOT GAAGTATCTA AACCCAGGGA ATCAOATTGA 240 

OOCACCATOT CCATCOCCTT GAOAATTAAT AGGCTOCATT TCTGGOTTCT CCNTTTTTTT 300 

TTTTTTTTTG CCCAACTGAG TCTTTCTGTG GACTTACATG GAACTTCTTA TTCTCTTAAA 360 

TCATTAAGTT ACTTGACAAT ATTCTTGOAT TTGGAGAAAC TOGATGTAGG GCCGTATGAA 420 

AAAATCATTC GA 43 2 

( 2 ) INPORMAnON FOR SEQ ID NO^: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 62 ammo adds 
( B ) TYPE: armo acid 
( C ) SIRA^3DEDNESS: 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: proccin 

( I i ) SEQUENCE DESCRIPnON: SEQ ID N0:2: 

Met Set lie Ala Leu Arg lie Asa Arg Leu Hi* Phe Trp Val Leu Xaa 
15 10 15 

Phe Phe Phc Phc Phe Ala Gin Leu Set Leu Ser Val Asp Leu His Oly 
2 0 2 5 3 0 

Tbr Ser Tyr Set Leu Lys Ser Leu Ser Tyr Leu Thr lie Phc Leu Asp 
3 5 4 0 4 5 

Leu GIu Lys Leo Aip Val Oly Pro Tyr Glu Lys lie lie Arg 
5 0 5 5 6 0 

( 2 ) INFORMATION FOR SEQ ID N03: 

( i ) SEQUENCE CHARACTERISmCS: 
( A ) LENGTH: 219 base pairs 
( B ) TYPE: oucldc add 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE; cDNA 

( X i ) SEQUENCE DESCRIPHON: SEQ ID N03: 

ATAGGATACN GTATCTNGCT TTTTTCATTT AAACOTCONG AGCAATTTTC CCAAOACATA 60 

ACAAACTGTC TTNOAAAAAN GGAAAACATT NGOGGCTGTC AGCANAACNG AAAATGTTTT 120 

CTGGGTGAOA CACATGTATC TTNONAATGG GTTOGATTTA GTGTGCTTTA TTTCAATAAA 180 

AATTCAGTAT TATAATTTAA AAAAAAAAAA AAAAAAAAA 2 19 

( 2 ) INPORMAnON FOR SEQ ID NO:4: 

( i ) SEQUENCE CHARACTERISTICS: 
{ A ) LENGTH: 301 base pairs 
( B ) TYPE: floddc add 
( C ) STRANDEDNESS: doidtle 
( D ) TOPOLOGY: liaear 
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( I i ) MOLECULE TYPE: cDNA 

( X i > SEQUENCE I>ESC3UrnON:SEQ ID KO-^: 

TCCACAOGTO TCCANTCCCA GGTCCAACTO CAOATTTCOA ATTCOGCCTT CATGGCCTAO 60 

AOCOACOCOO AOAARAOCTC CGGGTGCCOC OOCACTGCAG COCTOAOATT CCTTTACAAA 120 

OAAACTCAOA GOACCGGOAA OAAAGAATTT CACCTTTOCG ACGTOCTAGA AAATAARGTC 180 

GTCTGGOAAA AOOACTGGAO ACACAAGCGC ATCSCAASY Y SRGTOAAOGA SAAASNGAKG 240 

OANBTAKWWM MOWOSWGAAA AATKTYWWKC AAMMWMGGTA TTTTCCCTTG OATATTAACT 300 

TGCATATCTG AAOAAATGGC ATTCCGOACA ATTTOCGTOT TOGTTGGAOT ATTTATTTGT 360 

TCTATCTOTG TOAAAGGATC TTCCCAOCCC CAAOCAAGAG TTTATTTAAC ATTTOATGAA 420 

CTTCGAOAAA CCAAOACCTC TGAATACTTC AGCCTTTCCC ACCATCCTTT AGACTACAGG 480 

ATTTTATTAA TGGATGAAOA T 50 1 

( 2 )INFORMAn0NFORSEQn>NO5: 

( i ) SEQUENCE CHARACIERlsnCS: 
( A ) LENGTH: 62 amino acifb 
( B ) TYPE: amino add 
( C ) SriRANDEDNESS: 
( D ) TOPOLOGY: Bnear 

( i i ) MOLECULE TYTC: protdn 

( X i )SEQUENCE DESOOFTION: SEQ IDNO:5: 

Met Ala Pho Arg Thr lie Cys Val Leu Val Qly Val Phe lie Cyi Ser 
1 3 10 15 

lie Cy$.Val Lyi Gly Ser Ser Gin Pro Gin Ala Arg Val Tyr Leo Thi 
2 0 2 5 3 0 

Pbe Asp Glu Leo Arg Olo Tbr Lya Tbr Ser Olu Tyr Pbe Ser Lea Ser 
3 5 4 0 4 5 

His His Pro Leo Aap Tyr Axg lie Leu Lea Met Asp Glu Asp 
5 0 5 5 6 0 

( 2 ) INFORMXnON FORSEQ ID NOiS: 

( i ) SEQUENCE CHARACIERISnCS: 
( A )LENGIH: 302 base pairs 
( B ) TYPE: oodeac add 
( C ) Sm^EDNESS: douUe 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: cDNA 

( X i } SEQUENCE DESCRIPnON: SEQ ID NO:6: 

CTAOCACTAO ACATGTCATG GTCTTCATGG TGCATATAAA TATATTTAAC TTAACCCAOA 60 

TTTTATTTAT ATCTTTATTC ACCTTTTCTT CAAAATCGAT ATOOTOGCTO CAAAACTAOA 120 

ATTGTTOCAT CCCTCAATNO AATOAGOGCC ATATCCCTOT OGTATTCCTT TCCTGCTTNG ISO 

GOGCTTTAGA ATTCTAATTO TCAOTGATTT TGTATATGAA AACAAGTTCC AAATCCACAO 240 

CTTTTACGTA OTAAAAGTCA TAAATOCATA TGACAOAATG OCTATCAAAA GAAAAAAAAA 300 

AA 3 0 2 

( 2 )INFC«MAnONFQRSEQIDNO;7: 

( i ) SEQUENCE CHARACIERISrnCS: 
( A ) LENGTH: 44Sttt3e pain 
( B ) TYPE: mxldc add 
( C ) STRANDEDNESS: doable 
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( D ) TOPOLOGY: liittar 

( i i ) MOLECULE TYPE: cDNA 

( X i ) SEQUENCE DESOUPnON: SEQ ID N0:7: 
OGCGAARGCA OCOOCAOOTC GGOAGCAARA TGOCOCTGCG GCCAGGAGCT GGTTCTOGTG 60 
GCGGCGOGGC CGCGARGAK Y ATR- 

RYOYORK KT Y YRY YSKG KKWKSMGOST TCATGTTTCC 120 

TOTTGCAGOT GGGATAAOAC CCCCTCAAGG CCTGATGCCG ATOCAGCAAC AAOGATTTCC ISO 
TATGGTCTCT GTCATGCAOC CTAATATOCA AOOCATTATG OGAATGAATT ACAOCTCTCA 240 
OATOTCCCAA GGACCTATTG CTATGCAGGC AGGAATACCA ATGGGACCAA TGCCAGCAGC 300 
OOOAATGCCT TACCTAGGAC AAGCACCCTT CCTOOGCATG CGTCCTCCAG OCCCACAGTA 360 
CACTCCAGAC ATOCAGAAGC AGTTTGCCGA AGAOCAOCAG AAACOATTTO AACAGCAGCA 420 
AAAACTCTTA GAAAAAAAAA AAAAAAAA 448 

( 2 ) INPORMAnON FOR SEQ n> NOrS: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 107 amiDO adds 
( B ) TYPE: amino acid 
( C ) STRANDEDNESS: 
( D )TOP<»jOGY: linear 

( i i ) MOLECULE TYPE: protean 

( X i ) SEQUENCE DE9CRIPI10N: SEQ ID NO:8: 

Met Pbc Pro Val Ala Gly Gly lie Arg Pro Pro Gin Gly Leu Met Pro 
1 5 10 15 

Met G!n Gin Gin Gly Phe Pro Met Val Scr Val Met Gin Pro A»n Met 
2 0 2 5 3 0 

Gin Gly lie Met Gly Met Asn Tyr Scr Scr Gin Met Scr Gin Gly Pro 
3 5 4 0 4 5 

lie Ala Met Gin Ala Gly lie Pro Met Gly Pro Met Pro Ala Ala Gly 
5 0 5 5 6 0 

Met Pro Tyr Leu Gly Gin Ala Pro Phe Leu Gly Met Arg Pro Pro Gly 
65 70 75 80 

Pro Oln Tyr Thr Pro Asp Met Gin Ly» Oln Phe Ala Glu Gia Gin Gin 
8 5 9 0 9 5 

Lys Arg Pbe Glu Gin Gin Gin Lya Leu Leu Glu 



( 2 ) INFORMATION FOR SEQ ID NO:9: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 29 base pain 
( B } TYPE: middc add 
( C } STRANDEDNESS: single 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: odicr cDcido ^ 

( A } DESOUPnON: /desc » '^iHgp&iiclcotide" 

( z i ) SEQUENCE DESCRIPnON: SEQ ID N0:9: 

GNGCCTCAAT CTGATTCCCT OGOTTTAOA 29 

( 2 )INFORMAnONF0RSBQIDNO:10: 

( i ) SEQUENCE CHARACIERISnCS: 
( A ) LENGTH: 29 hase pairs 
( B ) TYPE: nncldc add 
( C ) STRANDEDNESS: ungle 
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(D )TOPOUXJy:fincar 

( 1 i ) MOLECULB TYPE: odxi Dpddc add 

( A ) DESdUPnON: Mcsc = ^tfigomxieotid^ 

( X i ) SEQUENCE DESCRIPnONtSEQ ID NOilO: 

ONCCOOAATO CCATTTCTTC AOATATOCA 



2 9 



( 2 )tNFORMAnOKFORSEQIDNO:U: 

( i ) SEQUENCE CHARACIERI^CS: 
( A )tENGIH:29base{>8irt 
( B ) TYPE: mickic add 
( C ) SIRANDEDNESS: siiigle 
(D )T0POLOOy:fiMar 

( i i ) MOLECULE TTFEioiberimclde add 

( A ) DESCRIPnON: /desc » *V>&«oniiclootide" 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

TNCCATTOOT ATTCCTGCCT GCATAGCAA 



2 9 



What is claimed is: 

1. An isolated polynucleotide selected from the group 
consisting of: 

(a) a polynucleotide comprising the nucleotide sequence 
of SEQ ID N0:1; 

(b) a polynucleotide comprising the nucleotide sequence 
of SEQ ID N0:1 from nucleotide 247 to nucleotide 
432; 

(c) a polynucleotide comprising the nucleotide sequence 
of SEQ ID N0:1 from nucleotide 328 to nucleotide 
432; 

(d) a polynucleotide compising the nucleotide sequence 
of the full length protein coding sequence of done 
BD372_5 deposited under accession number ATCC 
98146; 

(e) a polynucleotide encoding the full length protein 
encoded by tiie cDNA insert of done BD372_^ depos- 
ited under accession number ATCC 98146; 

(f) a polynudeotide coursing the nucleotide sequence 
of the mature protein coding sequence of clone 
BD372_^ deposited under accession number ATCC 
98146; 

(g) a polynudeotide encoding the mature protein encoded 
by the cDNAinsert of done BD372_5 deposited undo: 
accession number ATCC 98146; and 

(h) a polynucleotide encoding a protein comprising the 
amino add sequence of SEQ ID N0:2. 

2. The polynudeotide of daim 1 conqxising the nude- 
odde sequence of SEQ ID N0:1. 

3. The polynudeotide of daim 1 comprising the nude- 
otide sequence of SEQ ID N0:1 from nudeotide 247 to 
nucleotide 432. 



4. The polynucleotide of daim 1 conqirising the nude- 
^ otide sequence of SEQ ID N0:1 from nucleotide 328 to 

nudeotide 432. 

5. The polynudeotide of claim 1 comprising the nucle- 
otide sequence of the fiill length protein coding sequence of 
done BD372_^ deposited under accession number ATCC 
98146. 

6. The polynudeotide of daim 1 encoding the full length 
protdn encoded by the cDNA insert of done BD372_^ 
deposited under accession number ATCC 98146. 

3^ 7. The polynudeotide of claim 1 conqirising the nude- 
otide sequence of the mature protein coding sequence of 
done BD372_5 deposited under accession number ATCC 
98146. 

8. The polynudeotide of claim 1 encoding the mature 
protein encoded by the cDNA inseit of done BD372^ 
deposited under accession number ATCX 98146. 

9. The polynucleotide of claim I encoding a protdn 
comprising the amino add sequence of SEQ ID N0:2. 

10. A vector conqxrising a polynudeotide of daim 1 
wherein said polynucleotide is operably linked to an expres- 
sion control sequence. 

IL A host cell transformed with a vector of daim 2. 

12. The host cell of claim 3, wherein said cell is a 
mammalian cdl. 

13. A process for producing a protein, which comprises: 

(a) growing a culture of the host cell of daim 3 in a 
suitable culture medium; and 

(b) puri^g the protein from the culture. 

14. An isolated gene corresponding to the cDNA sequence 
55 of SEQIDNO:!. 

itf * * * * 



United states Patent [i9] 

Stashenko et al« 



US005552281A 
(11) Patent Number: 
145] Date of Patent: 



5,552^81 
Sep, 3, 1996 



154] HUMAN OSTEOCLAST-SPECfflC AND 
-RELATED GENES 

(75] Inventors: Phflip Stadienko, Norfolk; Yi-Ping li, 
Boston; Anne L. Wadierpfennig, 
Brooldine, all of Mass. 

[73) Assignee; Forsyth Dental Infirmary for 
Children, Boston, Mass. 

[21] Appl. No.: 392^78 
[22] Filed: Feb. 23, 1995 

Related U^. Application Data 



[63] 
[51] 

[52] 

[58] 

[56] 



ConiinuaiioD of Set. No. 45;t70. Apr. 6. 1993» abandoned. 

Int a * C07H 21/04; Ci2N 5/10; 

C12N 15/70; C12Q 1/68 

U.S. CL 435/6; 435/69.1; 435/17Z3; 

435/252.3; 435/320.1; 536/23.1 

Field of Search 435/6. 320.1, 2523, 

435/69.1. 172.3; 536/23,1 

References Cited 

PUBUCATIONS 

Blair, Hany C et al.. "Extracellular-matrix degradation at 
acid pH. Avian osteoclast acid collagcnase isolatioo and 
characterization". Biochemical Journal 290(3):873-884 (1 5 
Mar. 1993). 

Tezuka, Ken-Ichi. el al., "Identification of osteopontin in 
isolated rabbit osteoclasts". Biochemical and Biophysical 
Research Communications 186(2):914-916 (31 Jul. 1992). 
Teruka, Kcn-Ichi. et al.. "Molecular cloning of a posable 
cysteine proteinase predominantly expressed in osteoclasts", 
Journal oJBiological Chemistry 269(2): 11 06-1 108, (14 Jan. 
1994). 

Horton, Michael A. el al.. "Monoclonal Antibodies to Osteo- 
dastomas (Giant Cell Bone Ibmors): Definition of Ostco- 
clast-spccific Cellular Antigens," Cancer Research 45, 
5663-5669 (Nov. 1985). 



Davics. John ct al.. "The Osteoclast Functional Antigen, 
Implicated in the Regulation of Bone Resorption, Is Bio- 
chemically Related to the Vitronectin Receptor." The Jour- 
nal of Cell Biology 109, 1817-1826 (Oct 1989). 
Hayraan, Alison. R. el al.. "Purification and characterization 
of a lartrate-resistant acid phosphatase from human osteo- 
clastomas," Biochenu / 261, 601-609 (1989). 
Sandbeig, M. ct al., "Localizadon of the Exprcssioo of 
Types I, m, and IV <3oUagen, TGF-Pl and c-fos (Sencs in 
Developing Human Calvarial Bones." Developmental Bioh 
W 130, 324-334 (1988). 

Sandbag. M ct al., •'Enhanced expression of TGF-p and 
c-fos mRNAs in the growth plaies of developing human 
long bones," Development 102, 46M70 (1988). 
Ek-^ylandcr. Barbro et al., "Qoning. Sequence, and Devel- 
opmental Expression of a TVpe 5, T^ruatc-rcsisiant, Acid 
Phoshaiasc of Rat Bone." The Journal of Biological Chem- 
istry 266(36). 24684-24689 (Dec 25, 1991). 
GcnBank/EMBL Sequence Search Printout, pp. 1-19 (Jun. 
24, 1993). 

Primary Examiner— Gary Jones 
Assistant Examiner— Paul B. Tran 
Attorney, Agent, or Finn— Hamilion. Brook. Smith & Rey- 
nolds, PC 



[57] 



ABSTRACT 



The present invention relates to purified DNA sequences 
encoding all or a portion of an osteoclasi-spccific or -related 
gene products and a method for identifying such sequences. 
The invention also relates to anUbodics directed against an 
osteoclasi-spedfic or -related gene product. Also claimed 
are DNAccnsmicts capable of replicating DNA encoding all 
or a portion of an osieoclast-specific or -related gene prod- 
uct, and DNA consinicis capable of directing expression in 
a host cell of an osteoclast-spcdfic or -related gene product 



5 Claims, 1 Drawing Sheet 
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AGACACCTCT GCCCTCACCA TGAGCCTCTG GCAGCCCCTG GTCCTGGTGC 
GGGCTGCTGC TTTGCTGCCC CCAGACAGCG CCAGTCCACC CTTGTGCTCT 



TCCTGGTGCT 



61 



TCXICTGGAGA 



121 CCTGAGAACC AATCTCACCG ACAGGCAGCT GGCAGAGGAA TACCTGTACC GCTATGGTTA 

181 CACTCGGGTG GCAGAGATGC GTGGAGAGTC GAAATCTCTG GGGCCTGCGC TGCTGCTTCT 

241 CCAGAAGCAA CTGTCCCTGC CCGAGACCGG TGAGCTGGAT AGCGCCACGC TGAAGGCCAT 

.301 6CQAACCCCA CGGTGCGGGG TCCCAGACCT GGGCAGATTC CAAACCTTTG AGGGOGACCT 

361 CAAGTGGCAC CACCACAACA TCACCTATTG GATCCAAAAC TACTCGGAAG ACTTGCCGCG 

421 GGCGGTGATT GACGACGCCT TTGCCCGCGC CTTCGCACTG TGGAGCGCGG TGACGCCGCT 

481 CACCTTCACT CGCGTGTACA GCCGGGACGC AGACATCGTC ATCCAGTTTG GTGTCGCGGA 

541 GCACGGAGAC GGGTATCCCT TCGACX3GGAA GGACGGGCTC CTGGCACACG CCTTTCCTCC 

601 TGGCCCCGGC ATTCAGGGAG ACGCCCATTT CGACGATGAC GAGTTGTGGT CCCTGGGCAA 

661 GGGCg rCGTG GTTCCAACTC GGTTTGGAAA CGCAGATGGC GCGGCCTGCC ACTTCCCCTT 

721 CATCTTCGAG GGCCGCTCCT ACTCTQCCTQ CACCACCGAC GGTCGCTCCG ACQGQTTGCC 

781 CTGGTGCAGT ACCACGGCCA ACTACGACAC CGACGACCGQ TTTGGCTTCT GCCCCAGCGA 

841 GAGACTCTAC ACCCGGGACG GCAATGCTGA TGGGAAACCC TGCCAGTTTC CATTCATCTT 

901 CCAAGGCCAA TCCTACTCCG CCTGCACCAC GGACGGTCGC TCCGACGGCT ACCGCTGGTG 

961 CGCCACCA CC GCCAACTACG ACCX3GGACAA GCTCTTCGGC TTCTGCCCGA CCCGAGCTGA 

1021 CTCGACGGTG ATGGGGGGCA ACTCOGCGGG GGAGCTGTGC GTCTTCCCCT TCACTTTCCT 

1081 GGGTAAGGAG TACTCGACCT GTACCAQCGA GGGCCGCGGA GATGGGCGCC TCTGGTGCGC 

1141 TACCACCTCG AACTTTGACA GCGACAA GAA GTGGGGCTTC TGCCCGGACC AAGGATACAG 

1201 TTTGTTCCTC GTGGCGGCGC ATGAGTTC6G CCACGCGCTG GGCTTAGATC ATTCCTCAGT 

1261 GCCGGAGGCG CTCATGTACC CTATGTACCG CTTCACTGAG GGGCCCCCCT TGCATAAGGA 

1321 CGACGTGAAT GGCATCCGQC ACCTCTATGG TCCTCGCCCT GAACCTGAGC CACGGCCTCC 

1381 AACCACCACC ACACCGCAGC CCACGGCTCC CCCGACGGTC TGCCCCACCG GACCCCCCAC 

1441 TGTCCACCCC TCAGAGCGCC CCACAGCT6G CCCCACAGGT CCCCCCTCAG CTGGCCCCAC 

1501 AGGTCCCCCC ACTGCTGGCC CTTCTACGGC CACTACTGTG CCTTTGAGTC CX^GTGGACGA 

1561 TGCCTGCAAC GTGAACATCT TCGACGCCAT CGCGGAGATT GGGAACCAGC TGTATTTGTT 

1621 CAAGGATGGG AAGTACTGGC GATTCTCTGA GGQCAGGGGG AGCCGGCCGC AGGGCCCCTT 

1681 CCTTATCGCC GACAAGTGGC CCGCGCTGCC CCGCAAGCTG GACTCGGTCT TTGAGGAGGC . 

1741 GCTCTCCAAG AAGCTTTTCT TCTTCTCTGG GCGCCAGGTG TGGGTGTACA CAGGCGCGTC 

1801 GGTGCTGGGC CCGAGGCGTC TGG ACAAGCT GGGCCTGGGA GCCGACGTGG CCCAGGTQAC 

1861 CGGGGCCCTC CGGAGTGGCA GGGGGAAGAT GCTGCTGTTC AGCGGGCGGC GCCTCTGGAG 

1921 GTTCGACGTG AAGGCGCAGA TGGTGGATCC CCGGAGCGCC AGCGAGGTGG ACCGGATGTT . 

1981 CCCCGGGGTG CCTTTGGACA CGCACGACGT CTTCCAGTAC CGAGAGAAAG CCTATTTCTG 

2041 CCAGGACCGC TTCTACTGGC GCGTGAGTTC CCGGAGTGAG TTGAACCAGG TGGACCAAGT 

2101 GGGCTACGTG ACCTATGACA TCCTGCAGTG CCCTGAGGAC TAGGGCTCCC GTCCTGCTTT 

2161 GCAGTGCCAT GTAAATCCCC ACTGGGACCA ACCCTGGGGA AGGAGCCAGT TTGCCGGATA 

2221 CAAACTGGTA TTCTGTTCTG GAGGAAAGGG AGGAGTGGAG GTGGGCTGGG CCCTCTCTTC 

2281 TCACCTTTGT TTTTTGTTGG AGTGTTTCTA ATAAACTTGG ATTCTCTAAC CTTT 
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HUMAN OSTEOCLAST-SPECIFIC AND' 
.RELATED GENES 

RELATED APPUCAHON 

5 

This applicarioD is a continuation of application Scr. No. 
08/045.270 filed on Apr. 6, 1993 now abandoned 

BACKGROUND OF THE INVENTION 

10 

Excessive bone resorption by osteoclasts contributes to 
the pathology of many human diseases including arthritis, 
osteoporosis, periodontitis, and hypcrcalceniia of malig- 
nancy. During resoipcioa, osteoclasts remove both the min- 
eral and organic components of bone (Blair, R C, et al., /. ]5 
Cell B/^/. 102:1164 (1986)). The mineral phase is solubi- 
lized by actdificaticn of the sub*osteoclastic lacuna, thus 
allowing dissolution of hydroxyapatite (Vaes, G., Clin, 
Orthop. RelaL 231:239 (1988)). However, the incchaiiism(s) 
by which type I collagen, the major structural protein of 20 
bone, is degraded remains controversial. In additicm, the 
regulation of osteoclastic activity is only partly understood. 
The lack of information concerning osteoclast function is 
due in pan to the fact that these cells are extremely diCEcuU 
to isolate as pure populations in large numbers. Fonhermorc, 25 
there are no osteoclastic cell lines available. An approach to 
studying osteoclast function that permits the identification of 
heretofore unlcnown osteoclast-specific or -related genes and 
gene products would allow identification of genes and gene 
products that are involved in the resorption of bone and in 3^ 
the regulation of osteoclastic activity. Therefore, identifica- 
tion of osteclast-specific or -related genes or gene products 
would prove useful in developing therapeutic strategies for 
the treatment of disorders involving aberrant bone resorp- 
tion. 



35 



SUMMARY OF THE INVENTION 



The present invention relates to isolated DNA sequences 
encoding all or a portion of osteoclast-specific or -related ^ 
gene products. The present invention further relates to DNA 
constructs capable of replicating DNA encoding osteoclast- 
specific or -related gene products. In another embodiment, 
the invention relates to a DNA construct capable of directing 
expression of all or a portion of the osteoclast-specific or 
-related gene product in a host cell. 

Also encompassed by the present invention are prokary- 
oiic or cukaryotic cells transformed or transfectcd witii a 
DNA construct encoding all or a portion of an osteoclast- 
specific or -related gene product According to a particular 30 
embodiment, these cells are capable of replicating the DNA 
construct comprising ' the DNA encoding the osteoclast- 
specific or -related gene product, and, optionally, arc capable 
of expressing the osteoclast-specific or -related gene prod- 
uct. Also claimed are antibodies raised against osteoclast- 35 
specific or -related gene products, or portions of these gene 
products. 

The present invention further embraces a method of 
identifying osteoclast-spedfic or -related DNA sequences 
and DNA sequences Identified in this mannet In one 60 
embodiment, cDNA encoding osteoclast is identified as 
follows: First, human giant cell tumor of the bone was used 
to 1) construa a cDNA library; 2) produce *^P-Iabcllcd 
cDNA to use as a stroma] ceir; osteoclast^ probe, and 3) 
produce (by culturing) a stromal cell population lacking 65 
osteoclasts. The presence of osteoclasts in the giant cell 
tumor was confirmed by histological staining for the osteo- 



clast marker, type 5 tartrate-resistant add phosphatase 
(TRAP) and witii tite use of monoclonal antibody reagents. 

The stronul celt population lacking osteoclasts was pro- 
duced by dissociating cells of a giant cell tumor, then 
growing and passaging the cells in tissue culture until the 
cell peculation was homogeneous and appeared fibroblastic. 
The cultured stromal cell population did not contain osteo- 
clasts. The cultured stromal cells were then used to produce 
a stromal cell* osteoclast" ^^-labellcd cDNA probe. 

The cDNA library produced from the giant cell tumor of 
the bone was theo screened in duplicate for hybridization to 
the cDNA probes: one screen was performed witii the giant 
cell tumor cDNA probe (stromal ceir, osteoclast*), while a 
duplicate screen was performed using the cultured stromal 
cell cDNA probe (stromal ccir, osteoclast"). Hybridization 
to a stromal^ osteoclast"^ probe, accompanied by failure to 
hybridize to a stromaT, osteoclast' probe indicated that a 
clone contained nucleic acid sequences specifically 
expressed by osteoclasts. 

In another embodiment, gcnorruc DNA encoding osteo- 
clast -specific or -related gene products is identified through 
known hybridization techniqiies or amplification techniques. 
In one embodiment, the present invention relates to a 
method of identifying DNA encoding an osteoclast-specific 
or -related protein, or gene product, by saeerting a cDNA 
library or a genomic DNA library wiUi a DNA probe 
comprising one or more sequences selected from the group 
consisting of the DNA sequences set put in Table I (SEQ ID 
NOs: 1-32). Finally, the present invecitioo relates to an 
osteoclast-specific or related protein encoded by a nucle- 
otide sequence comprising a DNA sequence selected from 
the grtxjp consisting of the sequences set out in l^le I, or 
their complementaiy strands. 

BRIEF DESCRIPTION OF HG. 1 

The FIG. 1 shows cDNA sequence (SEQ ID NO: 33) of 
human gelatinase B, and highlights those portions of the 
sequence represented by the osteoclast-spedfic or -related 
cDNA clones of the present inventiorL 

DETAILED DESCRIPTION OF THE 
INVENTION 

As described herein. Applicant has identified osteoclast- 
specific or osteoclast-rdated nucleic acid sequences. These 
sequences were identified as follows: Human giant cell 
tumor of the bone was used to 1) construct a cDNA library; 
2) produce ^^P-labelled cDNA to use as a stromal ceir, 
osteoclasf^probe, and 3) produce (by culDiring). a stromal 
cell posnilation lacking osteoclasts. The presence of oste- 
clasts in the giant cell .tumor was confirmed by histological 
staining for the osteoclast marker, type 5 acid phosphatase 
(TRAP). In addition, monoclonal antibody reagents were 
used to characterize the multinucleated cells in the giant cell 
tumor, which cells were found to have a phenotypc distinct 
from macrophages and consistem with osteoclasts. 

The stromal cell population lacking osteoclasts was pro- 
duced by dissociating cells of a giant cell mmor, then 
growing the cclb in tissue culture for at least five passages. 
After five passages the cultured cell population was homo- 
geneous and appeared fibroblastic The cultured population 
contained no multinucleated cells at this point, tested nega- 
tive for type 5 add phosphatase, and tested variably alkaline 
phosphatase positive. That is, tiie culnired stromal cell 
population did not contain osteoclasts. The cultured stromal 
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cells were then used to produce a stromal celf*', osteoclast' 
'^P-labellcd cDNA probe. 

The cDNA library produced from the giant cell tumor of 
the bone was then screened in duplicate for hybiidizaiion to 
the cDNA probes: one screen was perfonned with the giant 
cell tumor cDNA probe (stromal cdT, osieroclast*), while a 
duplicate screen was perfonned using the culmred stroma] 
cell cDNA probe (stromal cell* osicoclasf) acnes that 
hybridized to the giant cell tumor cDNA probe (stromal^ 
osieoclasi*), but not to. the stromal cell cDNA probe (stro- 
mar, osteoclasi~). were assumed to contain nucleic acid 
sequences specifically expr^ed by osteoclasts. 

As a result of the differential screen described herein, 
DNA specifically expressed in osteoclast cells characterized 
as described herein was identiScd. This DNA, and equiva- 
lent DNA sequences, is referred to herein as osteoclast- 
specific or osteoclast-related DNA. Osteodast-specific or 
-related DNA of the present invention can be obtained from 
sources in which it occurs in nature, can be produced 
recombinantly or synthesized chemically; it can be cDNA, 
genomic DNA, recombinantly-produced DNA or chemi- 
cally-produced DNA. An equivalent DNA sequence is one 
which hybridizes, under standard hybridization conditions, 
to an osteodast-specific or -related DNA identified as 
described herein or to a complement thereof. 

Differential screening of a human osteoclastoma cDNA 
library was performed to identify genes specifically 
expressed in osteoclasis. Of 12,000 clones screened, 195 
clones were identified which are either uniquely expressed 
in osteoclasts, or are osteoclast-relaied. These clones were 
further identified as ostcoclast-spedfic, as evidenced by 
failure to hybridize to mRNA derived fiora a variety of 
unrelated human cell types, including epithelium, fibro- 
blasts, lymphocytes, myelomonocytic cells, osteoblasts, and 
neuroblastoma cells. Of these. 32 clones contain novel 
cDNA sequences which were not found in the GenBank 
database. 

A large number of cDNA clones obtained by this proce- 
dure were found to represent 92 kDa type IV collagenase 
(gelatinase B; E.C. 3.4.24.35) as well as tartrate resistant 
acid phosphatase. In situ hybridization localized mRNA for 
gelatinase B to multinucleated giant cells in human osteo- 
clastomas. Gelatinase B immunoreactivity was demon- 
strated in giant ceDs from 8/8 osteoclastomas, osteoclasts in 45 
normal bone, and in osteoclasis of Paget's disease by use of 
a polyclooal antisera raised against a synthetic gelatinase B 
peptide. In contrast, no immunoieaciivity for 72 kDa type IV 
collagenase (gelatinase A; EC 3,4.24.24). which is the 
product of a separate gene, was dctctted in osteoclastomas 30 
or normal osteoclasts. 

The present invention has utility for the production and 
identification of nucleic acid probes useful for identifying 
osteoclast specific or -related DNA. Osteoclast-spccific or 
•related DNA of the present invention can be used to 55 
produce osteoclast-specific or -related gene products useful 
in the therapeutic u^atment of disorders involving aberrant 
bone resorption. The osteoclast-specific or -related 
sequences are also useful for generating peptides which can 
then be used to produce antibodies useiul for identifying 
osteoclast-spccific or -related gene products, or for altering 
the activity of osteoclast-specific or -related gene products. 
Such antibodies are referred to as osteoclast-specific anti- 
bodies. Osteoclast-spccific anu*bodies are also useful for 
identifying osteoclasts. RnaDy, osteoclast -specific or -re- 
lated DNA sequences of the present invention arc useful in 
gene therapy. For example, ihcy can be used to alter the 
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expression in osteoclasts of an aberrant osteoclast -specific 
or -related gene product or to correct aberrant expression of 
an osteodast-specific or -related gene product The 
sequences described herein can further be used to cause 
osteodast-spedfic or related gene expression in' cells in 
which such expression does not ordinarily occor, ix., in cells 
which arc not osteoclasts. 

Example 1— Osteoclast cDNA Libary Construction 

Messenger RNA (mRNA) obtained from a human osteo- 
clastoma (*giant cell tumor of bone*), was used to construct 
an osteoclastoma cDNA library. Osteodastbnus are actively 
bone rcsorptive mmors, but are usually non-nxtastatic. In 
cryostat sections, osteoclastomas consist of -30% multi- 
nucleated cells positive for tartrate resistant acid phos- 
phatase (TRAP), a widely utilized phenoiypic marker spe- 
cific in vivo for osteodasu (Minldn, Calcf, Tissue Int. 
34:285-290 (1982)). The remaining cells are uncharacter- 
ized 'stromal* cells, a mixture of cell types with fibroblastic/ 
mesenchymal morphology. Although it has not yet been 
definitively shown, it is generally held that the osteoclasts in 
these tumors are non-transformed, and are activated to 
resorb bone in vivo by substance(s) produced by the stroma] 
cell element. 

Monoclonal antibody reagents were used to partially 
characterize the sm^ce phcnotype of the multinucleated 
cells in the giant cell mmors of long bone. In frozen sections, 
all multinucleated cells expressed CD68. which has previ- 
ously been reported to define an antigen specific for botii 
osteoclasts and macrophages (Horton, M. A. and M H. 
Helfnch. In Biology and Physiology of the Osteoclast. B. R. 
Riflcin and C. V. Gay editors. CRC Press. Inc. Boca Raton, 
Ra.. 33-54 (1992)). In contrast, no staining of giant cells 
was observed for CDllb or CD 14 surface antigens, which 
are present on monocyte/macrophages and granulocytes 
(Amaout, M. A. et al. / Cell Physiol. 137:305 (1988); 
Haziot. A. et al. / Immunol 141:547 (1988)). Cytocentri- 
fuge pr^arations of human peripheral blood monocytes 
were positive for CD68, CD lib, and CD 14. These results 
demonstrate that the multinucleated giant cells of osteoclas- 
tomas have a phenotype which is distinct from that of 
macrophages, and which is consistent with that of osteo- 
clasts. 

Osteoclastoma tissue was snap frozen in liquid nitrogen 
ajod used to prepare poly A*** niRA according to standard 
methods. cDNA cloning into a pcDNAD vector was carried 
out using a commercially-available kit (librarian, InVitro- 
gcn). Approximately 16x10* clones were obtained, >95% 
of which contained inserts of an average length 0.6 kB. 

Example 2— Stromal Cell mRNA Preparation 

A portion of each osteoclastoma was snap frozen in liquid 
nitrogen for mRNApr^>aration. The remainder of the tumor 
was dissociated using brief trypsinization and mechanical 
disaggregation, and placed into tissue culture. These cells 
were expanded in DuIbecco*s MEM (high glucose, Sigma) 
supplemented with 10% newborn calf serum (MA Bioprod- 
ucts). gentamydn {OS mg/ml),. l-glutandne (2 roM) and 
non-essential amino acids (0.1 mM) (Gibco). The stromal 
cell population was passaged at least five times, after which 
it stowed a homogenous, fibroblastic looking cell popula- 
tion that contained no multinucleated cells. The stromal cells 
were mononuclear, tested negative acid phosphatase, and 
tested variably alkaline phosphatase positive. These findings 
indicate that propagated stromal cells (Le., stromal cells that 
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arc passaged in culture) are non-oslcoclastic and non-acti- 
valeii 

Example 3— Identification of DNA Encoding 
Osieoclastoma-Spedfic or -Related Gene Products 5 
by Differentia] screening of an Osteoclastoma 
cDNA Libraiy 

A total of ] 2,000 clones drawn from the osteoclastoma 
cDNA libraiy were screened by differential hybridization, 
using mixed "P labelled cDNA probes derived from (1) 
giant cell tumor mRNA (stromal ceir, OC*). and (2) mRNA 
from stromal cells (stromal ceir, OC) cultivated from the 
same timior. The probes were labelled with ^[P)dCTP by 
random priming to an activi^ of -10?CPM/|ig. Of these 
12,000 clones, 195 gave a positive hybridization signal with 
giant cell (i.e.. osteoclast and stromal ceQ) mRNA, but not 
with stromal cell mRNA. Additionally, these clones failed to 
hybridize to cDNA produced from mRNA derived from a 
variety of unrelated human cell types including epithelial 
cells, fibroblasts, lymphocytes, myclomonocytic cells, 
osteoblasts, and neuroblastoma cells. The failure of these 
clones to hybridize to cDNA produced from mRNA derived 
from other cell types supports the conclusion that (hose 
clones are either uniquely expressed in osteoclasts, or are ^ 
osteoclast-relaied. 

The osteoclast (OC) cDNA library was screened for 
differential hybridization to OC cDNA (sUT)mal cell*, OC) 
and stromal cell cDNA (stromal cell*, OC) as follows: 

KYTRAN filters (Schleicher & Schuell) were placed on 30 
agar plates containing growth medium and ampicillin. Indi- 
vidual bacteria] colonics from the OC library were randomly 
picked and iransferred, in triplicate, onto filters with prer- 
ulcd grids and then onto a master agar plate. Up to 200 
colonics were inoculated onto a single 90-mm fiiier/platc 35 
using these techniques. The plates were inverted and incu- 
bated at 37" C. until the bacterial inoculates had grown (on 
the filter) to a diameter of 0^1.0 mm. 

The colonies were then lyscd, and the DNA bound to the 
filters by first placing the filters on top of two pieces of 40 
Whatman 3 MM paper saturated with 0.5N NaOH for 5 
minutes. The fillers were neutralized by placing on two 
pieces of Whatman 3 MM paper saturated with IM Tris- 
HCL, pH 8.0 for 3-5 minutes. Neutralization was followed 
by incubation on another set of Whatman 3 MM papers 
saturated with IM Tris-HCL, pH 8.0/1.5M NaQ for 3-5 
minutes. The filters were then washed briefly in 2xSSC. 

DNA was immobilized on the filters by baking the filters 
at 80° C. for 30 minutes. Filters were best used immediately, 
but they could be stored for up to one week in a vacuum jar * 
at room temperature. 

Filters were prehybridized in 5-S ml of bybridizaUon 
solution per filter, for 2-4 hours in a heal scalable bag. An 
additional 2 ml of solution was added for each additional 
filter added to the hybridization bag. The hybridization 



buffer consisted of 5xSSC, 5xDcnhanIt's solution, I % SDS 
and 100 pg/ml denamred heterologous DNA. 

Prior to hybridization, labeled probe was denatured by 
heating in IxSSC for 5 minutes at 100** C^ then immediately 
chilled on ice. Denatured probe was added to the filters in 
hybridization solution, and the filters hybridized with con- 
tinuous agitation for 12-20 houn at 65^ C 

After hybridization, the filters were washed in 2xSSC/ 
0.2% SDS at 50**-60* C for 30 minutes, followed by 
washing in 0.2>cSSC/0.2% SDS at 60" C. for 60 minutes. 

The filters were then air dried and autoradiographed using 
an intensifying screen at -70° C ovcmigbL 

Example 4^DNA Sequencing of Selected Clones 

Qones reactive with the mixed tumor probe, but unreac- 
tivc with the stromal cell probe, are expected to contain 
either osteoclast-rdated, or in vivo 'activated* stromal-cell- 
related gene products. One hundred and forty-four cDNA 
clones that hybridized to tumor cell cDNA, but not to 
stromal cell cDNA, were sequenced by the dideoxy chain 
termination method of Sanger et al. (Sanger P., et al. Proc. 
Nail Acad. Set USA 74:5463 (1977)) using sequenase (US 
Biochemical). The DNASIS (HitatcW) program was used to 
carry out sequence analysis and a homology search in the 
GenBank/EMBL database. 

Fourteen of the 195 tumor* stromaT clones were identi- 
fied as containing inserts with a sequence identical to the 
osteoclast marker, type 5 tartrate-resistant acid phosphatase 
(TRAP) (GenBank accession number J04430 M19534). The 
high representation of TRAP positive clones also indicates 
the effectiveness of the screening procedure in enriching for 
clones which contain osteoclast-specific or related cDNA 
sequences. 

Interestingly, an even larger proportion of the tumor* 
StromaT clones (77/195; 39.5%) were identified as human 
gelatinase B (roacropbage-derived gelaiinase) (Wilbelm, S. 
M. / Biol Chem. 264:17213 (1989)). again indicaUng high 
expression of this enzyme by osteoclasts. IVeoty-five of the 
gelatinase B clones were identified by dideoxy sequence 
analysis; all 25 showed 100% sequence homology to the 
published gelatinase B sequence (Genbanlc accession num- 
ber J05070). The portions of the gelatinase B cDNA 
sequence covered by these clones is shown in the FIGURE 
(SEQ ID NO: 33). An additional 52 gelatinase B clones were 
identified by reactivity with a ^'P-labellcd probe for gelati- 
nase B. 

Thirteen of the sequenced clones yielded no readable 
sequence. A DNASIS search of GenBank/EMBL databa'«^ 
revealed that, of the remaining 91 cloiies, 32 clones contain 
novel sequences which have not yet been reported in the 
databases or in the literature. These panial sequences are 
presented in Table L Note that three of these sequences were 
repeats, indicating fairiy frequent representation of mRNA 
related to this sequence. The repeat sequences are indicated 
by* * superscripts (Qones 198B, 223B and 32C of Table I). 



TABLE I 

PARTIAL SEQUENCES OF 32 NOVEL OC-SPECinC OR -RaATED 
EXPRESSED GENES (cDNA OjOWES) 



34A (SEQ ID NO: I) 
I CCAAATArCr 

61 AATorrrCTA 

121 CTGATATTCr 
4B (SEQ ID NO: 2) 
1 CTCTCAACCT 



AACTTTATTG 
GCCTTmTT 
CnrCAATAA 

GCATArCCTA 



c ncGAnrc 

AGTTTGTTTT 
ACaATAATA 

AAAATCTCAA 



taotcagacc 
tattcaaaaa 
gaaaatacca 

aatoctccat 



TGI IGAATTT 
TTTAAITAIT 
GCAGACAACA 

CrOCTTAATC 



GGTOATOTCA 
TATCCIArAC 



TCCOGGTACG 



5,552,281 



TABLE I-continucd 



PAJOIAL SEQUENCES OF 32 NOVEL OCSPECIFIC OR -RELATED 
EXPRESSED gENESCcDNA CLONES) 



61 CGO 
12B(5EQ1DNO:3) 
I CTTCCCTCTC 
61 CACCCCCACA 
121 CAAOCACCrO 
2BB (S EQ ID NO: 4) 
I TnT ATTTCT 
61 CTCTCTTTTC 
121 AAAOCAAACr 
37B (SEQ ID Na 5) 
1 CGCTGGACAT 
61 TTCCCCrCCC 

121 AGCCACrrrc 

181 ACAAAAAAAA 
553 (SEQ ID NO: 6} 
I TTGACAAAGC 
61 AAGACTAGTC 
121 TAATTTGCCr 
60B(SEQIDNO: 7) 
I GAAGAGACTT 
61 GATCCCGAGG 
86B (SEQ ID NO: 8} 
1 CCATOGAAAC 
61 CCAAACCTGA 

121 TCGrrccrcrr 

87B (SEQ ED NO: 9) 

1 rrcTTGATcr 

61 TAGGAG(XCT 
181 CAAIGATAAA 
98B (SEQ ID NO: m 
1 ACCCATITCr 
61 CTCAAAGAAT 
121 GAATATGAGG 
HOB (SEQ ID KO: 11) 
1 ACATATATTA 
61 TAAAGTXSGOA 

121 TAAcnrmr 

n8B(S£QIDNO: 12) 
1 CCAAATTTCT 

61 •nrcACTAcr 

I33B (SEQ ID NO: 13) 
I AACTAACaC 
61 CCTGAGCCAT 
121 AAAT 
140B (SEQ ID NO: 14) 
I ATTATrATTC 
61 AAAACACACA 
121 CATAAACCCO 
144B (SEQ ID NO: 15) 
I CGTGACACAA 
61 AACAGCATCn- 
198B*(5EQIDNO: 16) 
1 ATACCHTAGA 
61 ATCTGACrrC 
121 TCTACTCCAA 
181 ATCTCAnTG 
241 TTTAAT 
21 2B (SEQ ID NO: 17) 
1 CTCCAGTATA 
61 CCrCTAGATA 
121 AATCGCCTTC 
181 TCrCCAGC 
223B' (SEQ ID NO: 18) 
1 CCACrrTGGAA 
61 TCTTCACnr 
121 CCATGAOCTT 
181 TAAGACATGT 
241BCSEQIDNO|l9) 
I _ TCTTAOrri'l 
61 CTAGACGTCX: 
121 CGAACCCCTC 
181 CTATATOACC 
32C* (SEQ ID NO: 20) 

I an-ATTTcro 

121 ICCCTCTAVC 
161 GCKTTCGAACG 



TTGcrrcccr 

GGGAGTACTG 
GTGGTGAATG 


TTCCCAAGCA 
CCAGACTACr 
CrCCCTGGCA 


GAGCTGCTCA 

ccrcAKnrc 

CCGGACCCCC 


CTCCATCGCr 
TCTTAAGGCC 
CCC 


ACCCCCA(XA 
CAGGGACTCr 


AAATATATOT 
CTCTTGCTTC. 
CGCGCOATGO . 


ATTACATCCC 
TTCATGGTCC 
AAGCAOATTA 


TAGAAAAAGA 
ATQATCOCAC 
TTCTCCCATT 


ATCCCAGCAT 
CrCACCTTCT 
1 IICCAGCjTC 


TTTOCCTCCT 
CACn^ACAATG 
T7T 


CCGTCCCCrc 
CATCTCATCr 
TTACGCCACC 
AAAAAAA 


CACCTCOCTC 
ACCTGOACTG 

ArrrcccAGA 


ATATCCOCAG 

GCOOCTOOCCC 

CCACTCAICA 


GCACACTCTG 
TTCTTCAGCC 
CAITAAAAAA 


OCCTCAGCTT 
TTOAATCAAA 
TAmrOAAA 


TGTTTATTTC 
CCrATrATAT 
TC 


CACCAATAAA 
CGCGTATCAr 


TACn-ATATGO 
GTTGATGCTC 


TGATTGGGCT 

ArAAArA(?rT 


TrCTATTTAr 
CATATCTACT 


GTATGTACAA 
GAATT 


COCCAACACMS 


CAAGGCAGCT 


AAATGCAGAG 


GGTACAGAGA 


ArOTACiAAGT 

gatttcagca 
tccacgtatc 


CCACAGAAAA 

TAAAATCTTT 

AATAGCTTAT 


ACAAmTAA 
ACTTAGAAGT 
C 


AAAAAGGTCC 
OAOACAAAGA 


AAAAGmCG 
AGAGGGAGGC 


TTAOAACACT 
CCmTGGAA 
ACTTGACAAA 


ArGAATAGCC 
TGCTTGACnXj 
A 


AAAAAAGAAA 
AGGAGCrCAA 


AAACTGTTCA 

crAAGTCxn-cr 


AAATAAAATG 
CCCAAGAAAG 


AACAAl'l i n 

AGAC(}CAATA 

ACAACCTCTA 


ACKTTAAAAT 
TATAGCuCAT 
CrCCTCATTA 


TTTTCCrCAA 
CTTACTAGAC 
AAOCCCrCAG 


AGTTCTAAGC 
ATACAGTArT 
AA 


TTAATCACAT 

AAA ^^^^ A 

AAACTGGACT 


ACAGCATTCA 
ATGTATCAAG 
TTTTTACATT 


TTTCGCCAAA 
TATAGACTAT 
ATAAAATTAA 


ATCTACACCr 
GAAAGTOCAA 
CTTOTTT 


TTGTAGAATC 
ATAACAAGTC 


CTACTCTATA 
AAGGTTAGAT 


CrGCAAr(XA 

cx:agc 


Tccrcocrcc 


c:ArcACCArA 


GCCrCGAOAC 


CTCATTTCTG 


GGCCATCCCr 


TATGAGCOGC 


ATTTAuaCVA 

OCACTGAITA 


TAGGcrrrcG 


tat^tata a a 
tatctataaa 

CrCTAAGATA 


TirrrrrATG 

TCCCAITGAA 
CCACGTCCrC 


TTAGCITAOC 
GGGll J ICjTA . 
ATAtSGAAATT 


CATCCAAAAT 

cAnrcAcrc 
c 


TTACrCCTGA 
CTTACAAATA 


AGCACTTTAAT 
ACAAAGCAAT 


ACATGCATTC 
TCATCAGCAC 


GTTnATTCA 
GAAGCTCGCC 


TAAAACAGCC 
GTGGGCAGGG 


TGGTTTCCTA 
CGGCC 


AAACAATACA 


TTCTCATTCA 

TCACrrcxrrA 

TTCATAAATC 

Tcrr(xcrrc 


CGCGACTAGT 
AGTTCCCTCr 
TATTCATAAG 
TTTCiCACTTT 


TAGCTTTAAC 
TATATCCn-CA 
TCmCGTAC 
TRAAATAAAG 


CACCCTAGAG 
AGGTAGAAAT 
AACnTACATO 
TATITArCTC 


GACTAGGCTTA 
CTCTArCTTT 
ATAAAAACAA 
CTGTCTACAG 


AAGGAAACCC 
AAACACXXGA 
TACACATTAG 


mAGTCGGT 
TTAACAGATG 
CTCCACCTAA 


AAGCTAGAGG 

TTAACCTTTT 

AAAGACACAT 


AITCTAAATA 
ATGUilGAr 
TGACACCTTA 


TCTTTTATCT 
TTCCTTTAAA 
CAGGATA(jTC 


CGGACTTGGT 
CCCCATTTGT 
TTTCACTCTC 
GACTACAGCC 


GTGCTAi J 1 1 
TrCTCCTTCA 
GCCATCAAGG 
TCCCCCTGAC 


TGAAGCAGAT 
AATGATOCTT 
ACntCCTGA 
TG 


GTGGTOATAC 
CCTACTTTGC 
CAGCTTCTCT 


TCAOATTOTC 
TTCTCTCCAC 
ACTCTTAGGC 



tagcaacccc 
tatagttagt 

TrTGCTAGTA 
ATAGTAACGC 

ArCCrCACTT 
ACAGCGTCCA 
GCCAGGATTC 



TCTCTTCTGG 
CACTG CGGAT 
TCrCCATTTC 
TCT 

TGCACAACXiC 
CTTGTGArOC 
TCCACCTGCr 



GAOTOAGCTT 
CGTGAAAGAG 
TACAACATGG 



CCnCAGOCA 

taaaataagc 
tttccatttc 



TAITACTOCA 

GGAGAAGAGG 

TTTAGATCAr 



OAAGACTGAC 

TTCATCrCOC 

TCTTCCTAAA 



CrrCTTGCAG 

aagggcgaag 

AACCACAGCnr 



aaagtcatcc 

(jCTCTGCCTT 
nTCATT 



5,552^81 
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PARTIAL SEQUENCES OF 32 NOVEL OC SPEORC OR -RELATED 
EXPRESSED GENES (cDNA CLONES) 



34C(SEQIDNO: 21) 
1 CGGAGCQTAO 
61 COGCCCCCAC 
47C(SEQIDNa22) 
I TTAGTrCACT 
61 CTGGCACCrG 
121 GGAGCTOACC 
65C(SEQIDNa23) 

I GcrcAAicrr 

61 TGCAACTCTG 
121 AAaGCCCOT 
79C(SEQIDNa2d) 
1 GCCACTCGGA 
61 AOAAAACTGC. 
121 OOTGCCAAC 
84C(SEQroNa25) 
1 GCCAOGGCGO 
61 GACCrcCAGT 
121 OffTGCCTGAG 
86C (SEQ ID NO: 26) 
1 AACTCnrCA 
61 CTTCAXATCA 
121 TTCAATTATA 
g7C (SEQ ID NO: 27) 
1 CGATAAGAAA 
61 CGCAGCAGCC 
121 GTCCrCGTTG 
880 (SEQ ID NO: 28) 

I cTCACcrrcc 

61 TGTTCAACCC 
89C(S£Q1DN0: 29) 
1 ATOCCTGOCT 
61 TC CCTGA GTT 
121 TCGTTTTCTG 
lOlCCSEQIDNO: 30) 
I GCCrrCGGCAT 
61 CTCCCAGCXrC 
121 COrrAGCTTT 
inC(SE0IDNO:31) 
1 CCAACTCCTA 
161 CAATACTCTC 
114C(SEQIDNa32) 
1 CATGGATGAA 



GTOTOnTAT 
CCATCACCCC 

CAAAGCACGC 
CCGAGGmC 
CAGAOTGGA 

TAAGAGAOiO' 
AAITACGTGG 
TTAOAGTCCT 

TATCGAATCC 

CGAAACAAAG 

CrCGCCAOCT 



TCCTOTACAA 
AOTGCAATGG 



ACOjrcnTA 

GGCOCCTAGT 
TAQAACrrCT 

CACrCTGGTA 

ATTCATAITG 

AGAATATATC 

GAAGGCCTGA 
CGCACACGTT 
GCCGGTGCAC 

AOAOTTTOAC 
AGCCGTCAGC 

GTCGATAGTG 
TCGOAGTGTO 
GTOATCnTGT 

CCCrCTCCTC 
GGCTCTOAAO 
CCCATAAGGT 

COGCGATACA 
CTAAAATAAA 

TGTCTCATGG 



AACCCCCTTT 
CCCAACACCC 



TTTGCTCTTA 
TArGGATGGT 

crrAATArrc 

ACAAGGGAAA 

GATAIATOCr 

TCCOCAAGAT 

TTOCTCTCCT 
CATCTGrGGC 
TCTGGAAnC 

iilUAUij I 

AGCIGTCTCA 

CTAATACnr 

GGCCTAGGGG 
GAGAGOGGCA 
ACOCAOAAA 

CTOGAGCCGG 
OACOACTCCG 

CTTTTGTGTA 
GAAGTACTAC 
GCTAACAATA 

CTCCATCCCC 
CCAAGGGCCG 
TGGAGTATCr 

GACCCACAOA 
CATGAAGCAC 

TGGGAAGGAA 



ATCATTACAA 
CTAGCTGCrC 

GGCACTGCrC 
TCCTCTGCIT 



AAGGCTTCAT 
TG LTimilA 
ATGTOCTAAC 

CAAGCACTGO 
CATGGCrCOA 
GTGACTCCAO 

CtXTCAGAGG 
AGCGAAGGTG 
C 

A ACAATATAT 
TTCTTTTTTT 
TTAAAA 

CCGRGGCTGC 
CTTCCTCTTG 



ATACCTACTG 
GTCGGGAACTT 

CCAAATGCrC 
TTAACrGTCr 
AGAATAC 

ATACATCACC 
TCCGTGCCAC 
CC 

GTGCCATCCC 



CAJGGTACAT 



A ACCAA GTCT 
GCCll'i 

OCACTGGOGT 
CCCrGTGTGT 



CATOAAAGTO 
TTAACTAAAG 
ACTGGGTCTG 

ATAATTAAAA 
AATAAGAACA 
CCAGAAA 

TCAGGAAGGA 
AAGGOACrCA 



GTGTTGTGTC 
AATGGTCATA 



CCIGC GTCT C 
ClTAGGTrCG 



CCGCTAIGAC 
TCTGCGGCGA 



ocrccnAAG 

GTOCTGCTTG 



AGCrCTAATC 
GGTGGCrCTG 



TGACAGACCA 
TTC 



GGGGCACTCA 



ogGGcqgrr 

CGGGGTCTCA 



TACATCCATA 
ATGTACAGCA 
CTTATGC 

ACA OCrOOG O 
ACGOCTGTGG 



GGTCTGGCAG 
CCTTGTCCCX 



TTGCAAAITA 
TACAGTAOTA 



AGTCCTGGGA 
TGAGGATCTC 



TCGGTCAGOC 
T 



GTTATAGGGC 
GCTCrrOGTTA 



TTTACAAACC 
AGTATTCCTC 



GA00(3CrCCC 



^spaizd 3 dxss 
^Repealed 2 tisKS 



Sequence analysis of the OC stromal ccir cloned DNA 
sequences revealed, in additicm to the novel sequences, a 
number of previously-described genes. The known genes 
idcndfied (including type 5 acid phosphatase, gdatinase B. 
cystatin C (13 clones), Alu repeal sequences (11 clones), 
creamine kinase (6 clones) and others) are sununarized in 
Table II. In situ hybridization (described below) directly 
denoonsiraied that gelatinase B mRNAis expressed in multi- 
nucleated osteoclasts and not in stromal cells. Although 
gdatinase B is a well -characterized protease, its expression 
at high levels in ostebdasts has not been previously 
described. The expression in osteoclasts of cystatin C, a 
cysteine protease inhibitor, is also unexpected. This finding 
has not yet been confirmed by in situ hybridizatioa l^en 
together, these results demonstrate that most of these iden- 
tified genes areostcoclasi-exprcssed, thereby confirming the 
effectiveness of the differentia] screening strategy for iden- 
tifying DNA encoding osteoclast-sped&c or -related gene 
products. Therefore, novd genes identified by this method 
have a high probability of being OC-spedfic or related 

In addition, a minority of the genes identified by this 
screen are probably not expressed by OCs (Table II). For 
example, type III collagen (6 clones), collagen type I (1 
done), dermatansulfate (1 clone), and type VI collagen (1 



clone) are more likely to originate firom the stromal cells or 
*5 from osteoblastic cells which are present in the tumor. These 
cDNA sequences survive the differential screening process 
either because the cdls which produce them in the tumor in 
vivo die out during the stromal cell prc^agation phase, or 
because they stop producing their product in vitro. These 
^ clones do not constitute more than 5-10% of the all 
sequences selected by differentia] hybridization. 

TABLE n 

55 SEQUENCE ANAUrSIS OF CLONES ENCODING KNOWN 

SEQUENCES FROM AN OSTEOCLASTOMA cDNA 
UBRARY 



6S 



Ooaes vith Scqncnoe Homclogy 


25 total 


to CbUiicnase Type IV 


14 total 


Ooaes with Seqoenot Homology to 


Tjpc 5 Tmnsc Rcsutont Add RKMpbause 


13 total 


Qooes with Sequence Homology to 


Cytta^oC 


11 totil 


Oooes wi(h Sequence Homology to 


Ahi-repeai Seqnencei 


6 total 


Qones with Sequence Homology to 


Qeatnine Ksttase 




Oooei witb Sequeaoe Homology to 


Stotal 



5,552,281 



U 
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SEQUENCE ANALYSIS OF CLONES ENCODING KNOWN 
SEQUENCES FROM AN OSTEOCLASTOMA cDNA 
• UBRARY - 

Type m Collagen 

Qones with Scquenoe Homobgx to 5 toul 

MHC Ous I T Znvazunt Chain 

Clonei witb Sequena Homolosy to 3 toul 

MHC OusDpCfaua 

One or TWo Clonc(t) with Scqoence Homology to Esdi 10 total 

of the FbDowine: 

ol coQagen type 1 

7 ioierferon todadble protdn 

osteoponbn 

aglobis 

P ghieosidase/tphmgoKpid activtior 
Human CAPL protda (Ca biodiog) 
Human EST 01024 
Type VI collasea 
Human EST 00553 



UTP digoxygenin labelled cRNA piobcs. 

TABLEm 

In Sits HYBRIDlZAnON USING PROBES 
DERIVED FROM NOVEL SEQUENCES 

Reactivity with: 



to 



15 



20 



Cbiie 


Osteodaits 


Stroma] CcDt 


4B 


+ 


+ 


283* 


+ 




37B 




+ 


868 






STB 






SSC 




+ 


98B 






118B* 


+ 




I40B» 


+ 




198B* 


+ 




212B* 


+ 




GeUiinase B* 







35 



40 



Example 5 — ^In situ Hybridiadon of OC-Expicssed 
Genes 

In situ hybridization was performed using probes derived ^ 
from novel cloned sequences in order to determine whether 
the novel putative OC-specific or -related genes arc differ- 
entially expressed in osteoclasts (and not expressed in the 
stromal cells) of human giant cell tumors. Initially, in situ . 
hybridization was performed using antiscnse (positive) and ^ 
sense (negative control) cRNA probes against human type 
IV collagcnase/gelaiinase B labelled with '*S-UTR 

A thin section of human giant cell tumor reacted with the 
antiscnse probe resulted in intense labelling of all OCs» as 
indicated by the deposition of silver grains over these cells, 
but failed to label the stromal cell elements. In contrast, only 
minimal background labelling was observed with the sense 
(negative control) probe. This result confirmed that gclati- 
nase B is expressed in human OCs. 

In situ hybridization was then carried out using cRNA 
probes derived from 11^2 novel genes, labelled with 
digoxigcnin UTP according to known methods. 

The results of this analysis are summarized in Table HI. 
Clones 28B. 1183. 140B. 19SB. and 212B all gave positive 45 
reactions with OCs in frozen sections of a giant cell tumor, 
as did the positive control gelatinase B. These novel clones 
therefore are expressed in OCs and fulfill all criteria for 
OCrelatcdness. 198B is repeated three limes, indicating 
relatively high expression. Qones 4B. 37B. 88C and 98B 50 
produced positive reactioru' with the nimor tissue; however 
the signal was not well-localized to OCs. These clones arc 
therefore not likely to be useful and are eliminated from 
further consideration. Qones 86B and STB failed to give a 
positive reaction with any cell type, possibly indicating very 55 
low level expression. This group of clones could still be 
useftil but may be difficult to study further. The results of this 
analysis show that 5/1 1 novel genes arc expressed in OCs. 
indicating that -50% of novel sequences Kkdy to be OC- 
relaled. 

To generate probes for the in situ hybridizations, cDNA 
derived from novel cloned osteodast-speciflc or -related 
cDNA was subcloncd into a BlueScripi II SK(-) vector. The 
orientation of cloned inserts was determined by restriction 
analysis of subclones. The T7 and T3 promoters in the 65 
BlueScripin vector was used to generate "S-labelled C^S- 
UTP 850 Ci/nunol, Amcrsham, Ariingion Heights. 111.), or 



60 



*OC-npiessed. as iufiated by teactivity with anlisense probe and lack of 
reactivity with seate probe oo OO only. 

In situ hybridization was carried out on 7 micron cryostat 
sections of a human osteoclastoma as described previously 
(Chang, L.-C. ci al. Cancer Res, 49:6700 (1989)). Briefly, 
tissue was fixed in 4% paraformaldehyde and embedded in 
OCT (Miles Inc., Kankakee, Ifl.), The sections were rehy- 
draied, postiixed in 4% paraformaldehyde, washed, and 
pretreated vdth lOmM DTT, 10 mM iodoacctamide, 10 mM 
N-ethylmaldmidc and O.I triethanolaminc-HCL. Prehybrid- 
ization was done with 50% deionized formamide, 10 mM 
Tris-HQ, pH 7.0, Ix Dcnhardt's, 500 mg/ml tRNA, 80 
mg/ml salmon sperm DNA, 0.3M NaQ, mM EDTA, and 
100 mM DTT at 45* C for 2 houn. Fresh hybridization 
solution containing 10% dexU^ sulfate and 1.5 ng/nH 
'^S-labelled or digoxygenin labelled RNA probe was 
applied after bear denamration. Sections were coverslipped 
and then incubated in a moistened chamber at 45'*-5(r C. 
ovemighL Hybridized sections were washed four times with 
50% fomiamide, 2x SSC, containing 10 mM DTT and 0.5% 
Triton X-100 at 45° C Sections were treated with RNase A 
and RNasc Tl to digest siagle-stranded RNA, washed four 
times in 2x SSC/10 mM DTT. 

In order to detect '^S-labelling by autoradiography, slides 
were dehydrated, dried, and coated with Kodak N13-2 
emulsion. The duplicate slides were split, and each set was 
placed in a black box with desiccani, sealed, and incubated 
a 4" C for 2 days. The slides were developed (4 minutes) 
and fixed (5 nunutes) using Kodak developer DI9 and 
Kodak fixer. Hematoxylin and eosin were used as counter- 
stains. 

In order to detea digoxygenin-labclled probes, a Nucleic 
Acid Detection Kit (Bochringcr-Mannhcini, Cat #1175041) 
was used. Slides were washed in Buffer 1 consisting of 1 00 
mM Tris/150 mM NaQ, pH7^. for 1 minute. 100 pi Buffer 
2 was added (made by adding 2 mg/ml blocking reagent as 
provided by the manufacturer) in Buffer 1 to each slide. The 
slides were placed on a shaker and gendy swiried at 20^ C 

Antibody solutions were diluted 1:100 with Buffer 2 (as 
provided by the manufacturer). 1(X) pi of diluted antibody 
sohnion was applied to the slides and the slides were then 
incubated in a chamber for 1 hour at room tctzq^eniture. The 
slides were monitored to avoid drying. After incubation with 
anu*body sohnion, slides were washed in Buffer 1 for 10 
minutes, then washed in Buffer 3 containing 2 mM levami- 
sole for 2 minutes. 

After washing, 100 pi color solution was added to the 
sHdcs. Color solution consisted of nitroblue/tetrazoHum salt 



5,552;281 
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(NBT) (1:225 dilulion) AS jU, 5-bionK)-4-chlon>-3-indo!yI 
phosphate (1:285 dilution) 3.5 pi, Icvamisolc 0^ mg in 
Bufifer 3 (as provided by the mamifacturer) in a total volume . 
of 1 ml. Color solution was prepared immediately before 
use. 3 

After adding the color solution, the slides were placed in 
a dark, humidified chamber at 20*" C for 2-5 hours and 
monitored for color development The color reaction was 
stopped by rinsing slides in TB Buffer 

The slides were stained for 60 seconds in 0.259& methyl 
green, washed with tap water, then mounted with water- 
based Permount (Rshcr). 

Example 6~lmmunohistochemistiy 

IS 

Inmtunohistochemical staining was performed on frozen 
and paraffin embedded tissues as well as on cytospin prepa- 
rations (see Table JV). The following antibodies were used: 
polyclonal rabbit anti-human gelatinase antibodies; AbllO 
for gelatinase B; monoclonal mouse anti-human CD68 anti- 20 
body (clone KPl) (DAKO, Denmark): Mol (anti-CDUb) 
and Mo2 (anti-CD 14) derived from ATCC cell lines HB 
CRL 8026 and TIB 228/HB44. The anti-human gelatinase B 
antibody AbllO was raised against a synthetic peptide with 
the amino acid sequence EALMYPMYRFTEGPPLHK 25 
(SEQ ID NO: 34), which is specific for human gelatinase B 
(Corcoran, M. L. et al. / BioL Chem, 267:515 (1992)). 

Detection of the immunobistochemica] staining was 
achieved by using a goat anti-rabbit glucose oxidase kit 
(Vector Laboratories, Burliogame Calif.) according to the 30 
inannfacturer*s directioos. Briefly, the sections were rehy- 
drated and pretested with cither acetone or 0.1% trypsia 
Norma] goat serum was used to block nonspecific binding. 
Incubation with the primary antibody for 2 hours or over- 
night (AbU 0:1/500 dilution) was followed by either a glu- 35 
cose oxidase labeled secondary anti-rabbit serum, or, in the 
case of the mouse monoclonal antibodies, were reacted with 
purified rabbit anti-mouse Ig before incubation with the 
secondary antibody. 

Paraffin embedded and firozen sections from osteoclasto- ^ 
mas (GOT) were reacted with a rabbit antiserum against 
gelatinase B (antibody 110) (Corcoran, M. L. ct al. / Biol 
chem. 267:515 (1992)), followed by color development with 
glucose oxidase linked reagents. The osteodasts of a giant 
celt tumor were uniformly strongly positive for gelatinase B. 
whereas the stromal cells were imreactive. Conu^l sections 
reacted with rabbit prcimmune serum were negative. Iden- 
tical findings were obtained for all 8 long bone giant cell 
tumors tested (Table IV). The osteoclasts present in tiirce out 
of four central giant cell granulomas (CCG) of the mandible ^. 
were also positive for gelatinase B expression. These neo- 
plasms are similar but not identical to the long bone giant 
cell tumors, apart from their location in the jaws (Shafer, W. 
G. ct al.. Textbook of Oral Pathology, W. B. Saunders 
Company, Philadelphia, pp. 144-149 (1983)). In contrast, 
the multinucleated cells from a perif^ioal giant cell tumor, 
which is a generally non-rescrptive tumor of oral soft tissue. 



were unreactive with antibody (Shafer, W. G. et al., Text- 
book of Oral Pathology, W. B, Saunders Company, Phila- 
delphia, pp. 144-149 (1983)). 

Antibody 110 was also utilized to assess the presence of 
gelatinase B in normal bone (a=3) and in Paget's disease, in 
which there is elevated bone remodeling and increased 
osteoclastic activity. Strong staining for gelatinase B was 
observed in osteodasts both-in normal bone (mandible of a 
2 year dd), and in Paget's disease. Staiaing was again absent 
in controls incubated with prcimmune serum. Osteoblasts 
did not stain in any of the tissue sections, indicating that 
gelatinase B expression is limited to osteodasts in bone. 
Finally, peripheral blood nxmocytes were dso reactive with 
antibody 110 (Table IV). 

TABLE IV 

DlSTRIBtmON OF GELATINASE B IN VARIOIJS 
TISSUES 



Smplei* 



Antibo(£cs tased 

Ab no 

Bcluinue B 



OCT frozen 
(n = 2) 

fum cells 
stroaul ceUs 

(n»6) 

pout cells 
immtl cells 
ceotnti GCG 
(n = 4) 

fiani cells 
stromal ccUi 
penpfacrel GCT 
(0-4) 

guDt ecIb 
stromal cells 
pBgct*! disease 

osteocluu 
osteoblasts 
normal bone 
(a = 3) 

osteocl&su 
osteoblasts 
monocytes 
(cylosptn) 



+04) 



Distribotioii of gehtioase B b cwlthmrieated pmx csQs, ostcodosu. oitco- 
bUsu tnd stromal ceUs ia varions tissues, la teceral, ptnSa *m\^AAf^ 
tissues were vted for these experuoema; exccpdons are itt&atcd 

Equivalents 

Those skilled in the art will recognize, or be able to 
ascertain using no more than routine experimentation, many 
equivalents to the specific embodiments described hereia 
Such equivalents art intended to be encompassed by the 
following claims. 



SEQUENCE UJUNC 



( I ) GSfERAL IhfFDRMXnON: 

(ill ) NUMBER OP SEQUCKCES: 94 



15 
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( 2 ) XNFORMXnON FOR 5£Q D Itai: 

( i )SEQUEHCECHAJL^C^EJU^^CS: 
( A )LEKGTH: ITObncpalrs 
( B )TyPE:ndacidd 
( C )SntANDEDKESS: double 
(D)T0POU)CY'.l&)ev 

( I I }MOL£CULBTn>&DNA(teBemic) 

( 1 I ) SCQUENaOESCRIPnON: SEQ DKO:): 
OCAAATATCT AAOTTTATTC CTTOOATTTC TAGTCACACC TCTTCAATTT CCTGATCTCA 60 
AATOTTTCTA QOOTTTTTTT AOTTTCTTTT TATTGAAAAA TTTAATTATT TATOCTATAC 120 
GTGATATTCT CTTTCAATAA ACCTATAATA GAAAATAGCA GCAGACAACA 170 



( 2 ) [NFORMATTOK FOR SEQ ID NO-J: 

( i )SEQUE>(CEaiAKACmU5nCS: 
( A ) LENOnt 63 bne {Bin 
( B ) TYPE: SBddc Kid 
( C ) nUANDEDNEU; doable 
( D)T0F0L0OYiIncir 

( i I )MOtXOUL£TrFE:DNA(0naDic) 

( X 1 ) SEQUENCE DESCRIPnON: SEQ ID NO>J: 
GTCTCAACCT OCATATCCTA AAAATGTCAA AATCCTOCAT CTOOTTAATC TCGGCGTAOG 
CGC 



6 0 
63 



( 2 ) INHJRMAnON FOR SEQ ID NOJ: 

( i ) SEQUENCE CHARACTERETICS: 
( A ) LEKOTH: 163 bua pa '" 
( B ) TY?C: eoddc idd 
( C ) SrRANDa)NESS: doable 
( D )TOPOLOOY:EDur 

( i i ) MOLECULE TYPE: DMA dcsomic) 

( K { ) SEQUCKCB DESCRIFnON: SEQ ID NO J: 
CTTCCCTCTC TTGCTTCCCT TTCCCAACCA CACGTOCTCA CTCCATOCCC ACCGCCACCA 
CACCCCCACA GCCAOTACTG CCAOACTACT GCTGATGTTC TCTTAAGGCC CAOOOAOTCT 
CAACCAOCTC GTOGTGAATG CTGCCTOCCA CGGGACCCCC CCC 



6 0 
t 20 
t 6 3 



( 3 } WKXlMAnON FOR SEQ ID N0:4: 

( i ) SEQUENCE CHARACTERiniCS: 
( A ) LENGTH: 173 tuse pain 
( B ) TYPE: mdde idd 
( C ) Sm^^EDNESS: ttoablr 
( DlTOraUXJr Unr 

I t i > MOLECULE TYF&DNAUesamic) 

(si) SEQUENCE 0£SCRl?nON: SEQ ID N0:4: 
TTTTATTTOT AAATATATOT ATTACATCCC TACAAAAAGA ATCCCACGAT TTTCCCTCCT 
OTGTCTTTTC OTCTTOCTTC TTCATGGTCC ATOATGCCAO CTCACCTTOT CACTACAATG 
AAACCAAACT CCCCCGATGO AAGCAOATTA TTCTOCCATT TTTCCAGCTC TTT 



6 0 
1 2 0 
I 73 



( 2 ) INFORMATX)N FOR SEQ ID MkS: 

( i } SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 197 bite psin 
( B )TYFE:Bacfeie icid 
( C ) niUKD£DNESS:desbIe 
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( D ) TOPOLOGTfi Eacir 

( I i ) MOLECULE TYPE: DNACscnocdc) 

( 1 i )SEQUENCEDBSC3tIPnmJ:$EQIDN0'J: 
GGCTOOACAT GGGTOCCCTC CACGTCCCTC ATATCCCCAC CCACACTCTC GCCTCAGCTT 60 
TTCCCCTGCC CATOTCATCT ACCTGGACTC OGCCCTCCCC TTCTTCAOCC TTOAATCAAA 120 
ACCCACTTTO TTACCCGAGC ATTTCCCAGA CCACTCATCA CATTAAAAAA TATTTTOAAA ISO 
ACAAAAAAAA AAAAAAA ^91 

( 2 ) INFORMATION FOR SEQ m N0:6c 

( 1 ) SEQUENCE CHARACTERITTICS: 
( A ) LENGTH: 132 b«» pdn 
( B ) TYF& OBCleie add 
( C ) STUANDEDNESS: double 
( O ) TOPOLOGY: Uoesr 

( i i )MOLECULETYP£:ONA(seaaisic) 

( 1 i ) SEQUENCE DESaumON: SEQ ID HOA: 

TTCACAAAGC TGTTTATTTC CACCAATAAA TAOTATATGO T 0 A T TOCCGT TTC T A T TT A T 60 

aagagtagtc gctattatat ooootatcat ottgatgctc ataaatagtt catatctact lao 

TAATTTGCCT TC 1^2 

( 2 ) INFORMATION FOR SEQ m N0:7: . 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LEKGTTI: 73 hue pan 
( B } TYPE: imdeb add 
( C } STRANDEDNEU: double 
( D )TOPOLOCY:liDeir 

( i i ) MOLECULE TYPE: DNA (ceaoraic) 

( X i ) SEQUENCE OESCRIFTTON: SEQ ID N0:7: 
CAACAGACTT OTATCTACAA CCCCAACAOG CAAOGCAOCT AAATOCAQAO GCTACAGAGA 60 
CATCCCGACG GAATT '5 

( 2 ) INFORMATION FOR SEQ ID NO.!: 

( t )SEQUENCECKAXACTER1STICS: 
( A ) LENGTH: 131 buc pain 
( B )TYPE:iiiddc«cy 
< C ) STRANDEDNESS: double 
( D )T0POL0CY:K8Cir 

( i i ) MOLECULE TYPE: DNA (paoanc) . 

< X i ) SEQUENCE DESCRIPnON: SEQ ID fiOA: 

OOATGGAAAC ATOTAOAAOT CCAQAOAAAA ACAATTTTAA AAAAAOGTGG AAAAOTTACO 00 

/ 

CCAAACCTGA GATTTCAOCA TAAAATCTTT AGTTAGAACT GACAGAAAGA AGAOGGAGGC 120 

TGGTTGCTOT TGCACCTATC AATAGGTTAT C 1^1 

( 2 ) INFORMATION FOR SEQ ID NO* 

( i ) SEQUENCE CHARACTBUSTTCS: 
( A } LENGTH: 141 bue pain 
( B )TyPE:iuelcieKxl 
( C ) TTRANDEDNESS: dadtik 
( D )TO?OLOCY^fiBcar 

( I i ) MOLECULE TYPE: DNA (paoek) 
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, ( X i ) SEQUENCE DESOUPTION: SBQ ID KO^, 
TTCTTOATCT TTACAACACT ATCAATAOOO AAAAAAGAAA AAACTCTTCA AAATAAAATC . 60 

TACGAGCCCT OCTTTTGGAA TGCTTCAGTO AGCAGCTCAA CAAGTCCTCT CCCAACAAAC 120 
CAATGATAAA ACTTOACAAA A 

( 2 ) INFORMATION FOR SEQ ID NOilO: 

( I ) SEQUENCE OtARACIEUmCS: 

( A ) LENOTK: 162 boc ptin / 

( B ) tYP£: suclde idd 

( C ) STOANDEDNESS: double 

{ D } TOPOUX>r. tioctf 

( i I ) MOLECULE TYPE: DNACccaesue) 

( X i ) SEQUENCE OESOUTnOM: SEQ JD KOAO: 
ACCCATTTCT AACAATTTTT ACTOTAAAAT TTTTCOTCAA AGTTCTAAOC TTAATCACAT 60 
CTCAAAOAAT AOACOCAATA TATAOCCCAT CTTACTAOAC ATACAOTATT AAACTOOACT IJO 
CAATATOAGG ACAAOCTCTA CTGGTCATTA AACCCCTCAG AA 162 

( 2 ) INFORMATTON P(»l SEQ ID NO:ll: 

( I ) SEQUENCE CHARACraUSnCS: 
( A ) LENCni: 1J7 bue pain 
( B }'nr?E:BtKlcctcid . 
( C) HANDEDNESS: dosbk 
t D )TDP«a^OCYiliDcif 

( i i )M(HXCULETYPC:DNA(|nw(nie} 

i tl) SEQUENCE DESCRimON: SEQ ID Nail: 
ACATATATTA ACaCCATTCA TTTGCCCAAA ATCTACACGT TTCTAGAATC CTACTGTATA 60 
TAAACTGGGA ATGTATCAAG TATACACTAT GAAAGTGCAA ATAACAAOTC AAGOTTAOAT 120 
TAACTTTTTT TTTTTACATT ATAAAATTAA CTTCTTT 

( 2 ) JSfOHM/JVOH fOR SEQ ID Nail: 

( i ) SEQUENCE CHARACTERISTICS: 
( A } LENGTH: 75 baie ptin 
( B )1YPE: mickie Kid 
( C ) 5TKANDEDNESS: double 
( D )TOFOLOCr:tirxv 

( i i ) MOLECULE TYPE: DMA (icaoslc) 

( t i ) SEQUENCE DESOUFTION: SEQ ID KO:12: 
CCAAATTTC7 CTOCAATCCA TCCTCCCTCC C ATCACC ATA GCCTCGAGAC CTCATTTCTO «0 
TTTGACTACT CCAGC '5 

( 2 ) INFORMAnON FOR SEQ ID N0:13t 

( i ) SEQUENCE CHARACTERISnCS: 
( A ) tENGTH: 124 buc pain 
( B ) TTFE: Bodcic icid 
( C ) STRANDEDKESS: doiMe 
( D)TOPOU)CY:tiDe» 

( t i ) MOLECULE TYPE: DMA (tcsasuc) 

( 1 i ) SEQUENCE DESCRIFTIONt SEQ ID NO:13: 
AACTAACCTC CTCOCACCCC TOCCTCACTC ATTTACACCA ACCACCCAAC TATCTATAAA 60 
CCTGACCCAT CCCCATCCCT TATGAOCOGC GCAOTOATTA TAOGCTTTCO CTCTAAOATA 120 
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( 2 ) INFOltMAHON FOR SEQ ID N0:14: 

( t }S£QUEHCECHAAACTEUST:CS: 
( A ) LENGTH: 151 ttc ptln 
( B )TYF£:euckicae)d 
( C ) mANDEOKESS: double 
(D)TOP0L0GY:tiocir 

( I I ) MOLECULE TYPE: DNACscttomk) 

( ■ i ) SEQUENCE DESCRIPTION: SEQ ID N0:14: 
ATTATTATTC TTTTTTTATC TTACCTTAOC CATCCAAAA7 TTACTOCTOA AOCAOTTAAT 60 
AAAACACACA TCCCATTGAA GGGTTTTGTA C A TTTC A C T C. C T T A C A A A T A ACAAAOCAAT 120 
GATAAACCCO GCACGTCCTC ATAGOAAATT C . 151 

( 2 ) INFORMATION PGR S£Q ID N0:15: 

( i ) SEQUENCE GHARACTHRiSTTCS: 
( A ) LENGTH 105 bus pst» 
( B ) rm: euclcic add 
( C ) mANDEDNESS: doolde 
( D ) TOPOUOGY; fiow 

( i i I MOLECULE TYPE: DNA (jencnrie) 

( X i ) SEQUENCE DESCRIPnON: SEQ ID N0:I5: 
CGTGACACAA ACATGCATTC OTTTTATTCA TAAAACACCC TCCTTTCCTA AAACAATACA 60 
AACACCATGT TCATCAOCAO OAAOCTGGCC GTCGCCAOOC OOOCC 105 

< 2 1 DfFORMATTOM FOR SEQ D N0:]6: 

( i ) SEQUENCE CHARACTERISITCS: 
( A ) LENGTH: 2<6 bue ptin 
( B 1 TYPE: saddc idd ■ 
{ C > STRANDCDNESS: douUe 
{ D )T0P0LOCY^liiJC« 

( I i ) MOLECULE TYPE: DNA (grmmfc) 

( 1 i ) SEQUENCE DESCRIFTION: SEQ ID Nat 6: - 

ATAOGTTAOA TTCTCATTCA COOGACTACT TACCTTTAAG CACCCTACAO CACTAGGCTA 60 

ATCTGACTTC TCACTTCCTA AGTTCCCTCT TATATCCTCA AGGTACAAAT GTCTATOTTT 120 

TCTACTCCAA TTCATAAATC TATTCATAAG TCTTTGGTAC AAOTTACATG ATAAAAAGAA ISO 

ATGTGATTTG TCTTCCCTTC TTTGCACTTT TGAAATAAAG TATTTATCTC CTOTCTACAO 240 
T.TTA AT . 24 6 

{ 2 ) INFORMAnON FOR SEQ ID N0:17: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) L£NQTK: Its buc pain 
( B ) TYPE: nscldc acid 
( C ) STRAKDEDNESS: doable 
( D ) TOPOLOGr. Itnor 

( I I ] MOLECULE TYPE: DNA Ucmmilc) 

(at) SEQUENCE DESCRIPnON: SEQ ID N0:t7: 
GTCCAGTATA AAOGAAAGCG TTAAGTCOGT AACCTAGACC ATTGTAAATA TCTTTTATGT 60 
CCTCTAOATA AAACACCCGA TTAACAGaTG TTAACCTTTT ATGTTTTGAT TTGCTTTAAA 120 
AATGCCCTTC TACACATTAO CTCCAOCTAA AAAGACACAT TGAOAGCTTA CAGOATAGTC ISO 
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TCTCCAOC IBS 

< 2 ) INFORMiOlON FOB SEQ D NChtt: 

( I ) SEQUENCE CHAlACIHUSnCS: 
( A ) LENtmt m boc ptin 
( B )TYFE:BiKlde«dd 
( C )STRANDEDKESS: doable 
( D ) Tt^CXjOOVt ioEV 

( I i ) MOLECULE TYPE: SNA (cmndc) 

( ■ 1 ) SEQUENCE DESOUFnortSEQ ID NChlS: 

CCACTTCCAA OGOACTTOOT CTOCTATTTT TOAAOCACAT OTGOTOATAC TOAOATTOTC 60 

TCTTCAOTTT CCCCATTTCT TTOTGCTTCA AATCATCCTT CCTACTTTOC TTCTCTCCAC 120 

CCATOACCTT TTTCACTGTO OCCATCAAGG ACTTTCCTCA CACCTTCTOT ACTCTTAGGC ISO 

TAAOACATGT CACTACACCC TGCCCCTOAC TO 212 

( 2 }IKF0a.VUrt(»tF0llS£QIDNai9t 

( I } SEQUENCE CHAaACIEIUSTXS: 
( A ) LENGTH: 3t)9 bttc pdci 
- ( B ) TYPE: toddc kU 
( C ) STKANDEDNESS: doote 
( D ) TQPQLOGIC tetf 

( i i ) MOLECULE nrP&DNACtcaamic) 

( a i )SEQUENaOESCUrTXm:S£QZDNa)9: 

TGTTAGTTTT TAGOAACGCC TOTCTTCTCC CAGTGAGCTT TATTAGTCCA CTTCTTOGAG 60 

CTAOACCTCC TATACTTACT CACTCCCCAT CGTGAAAOAG GGAGAACAGC AAGGGCCAAG 130 

GCAAGOCCTC TTTGCTACTA TCTCCATTTC TAGAAOATGC TTTAGATGAT AACCACAGGT 180 
CTATATCAGC ATAOTAAGGC TGT 203 

( 2 )INH>RMAnDNFORS£QIDM0t2ft 

( 1 ) SEQUENCE CUAXACIEXISnCS: 
(A)LENGTH:t77buepua 
( B )TYTE: site Kid 
f C ) STltANDEDNESS: dntble 
( D )7O?0L0G%bev 

( i 1 )MOL£CUL£TrPE:DttA(paasuc) 

( K i ] SEQUENCE DESCRIPnON: SEQ U3 tlOM: 

CCTATTTCTG ATCCTGACTT TGCACAACCC CCTTCAGCCA C A A C A C TG A C A A A G T C ATC C 60 

TCCGTCTACC AOAGCCTCCA CTTCTGATCC T A AA A T A A G C T TC A T C T CCG CCTGTGCCTT 120 

GGGTCGAACC CCCACCATTC TCCACCTGCT TTTGCATTTC TCTTCCTAAA TTTCATT 111 

{ 1 ) fNFORMAnON FOR SEQ (D NOab 

( i ) SEQUENCE CHARACmUSnCS: 
( A)LENCmiOife>i«|aia 
( B )Tr7e:BBekkadd 
( C ) CT1UNDEX2NESS: doable 
(D)T0PaLOGYiEB:v 

( I i ) MOLECULE nr7E:DRA(|esaai4 

( Ji i )SEQU£KCEOESaUITa»tSEQIDNO-Jl: 
CGCAGCCTAC CTCTGTTTAT TCCTCTACAA ATCATTACAA AACCAAGTCT OGGGCAGTCA 60 
CCOCCCCCAC CCATCACCCC AGTCCAATGG CTACCTGCTG CCCTTT |06 
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6 0 
] 2 0 
1 3 9 



( 2 ) INFORMATION POft 5EQ CD KOiH 

( i ) SEQUENCE aiASACTBUSnCS: 
( A ) LENOTH: 159 b«a psin 

( C ) STRAND£D?<ESS; doeUc 
(D )TOPOIjOCYitbxtr 

( i i )M0tECUl£TYF£:ONAC^oafsic) 

( I I ) SEQUENCE OESOUPnON: SEQ ID NO^ 
TTAOTTCAOT CAAAOCAOOC AACCCCCTTT OOCACTGCTG CCACTGOOOT CATOOCGOTT 
GTGGCAGCTC COCAGGTTTC CCCAACACCC TCCTCTGCTT CCCTGTOTOT CCOOGTCTCA 
OGAOCTOACC CAGAGTOOA 

( 2 ) DfFORM^TtON FOR SEQ a> ND-JS: 

( i ) SEQUENCE CHARACTERima: 
(A )t£r4GTH: 177buepsn 
( B ) TYPE: tudex acid 
( C ) nRANDEDNESS: doubtc 
(D)T0POU)G1ftEiic«r 

( i i ) MOLECULE TYPE: DNACECEoacc) 

( X i ) SEQUENCE DESCRIPnON: SEQ ID K0:23: 
GCTGAATOTT TAAOAGACAT TTTGGTCTTA AAGCCTTCAT CATOAAAOTG TACATGCATA 
TGCAAOTCTG AATTACCTCG T A T G G A TGCT TO CT TG T TT A TTAACTAAAC ATCTACACCA 
AACTCCCCOT TTAOAGTCCT CTTAATATTO ATGTCCTAAC ACTGGGTCTG CTTATGC 

( 1 ) IKFORMATION for SEQ ID N0:3<: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: W bne pun 
( B )TyFE:Backicanl 
( C ) STILANDEDNESS: doohlc 
(D)TOPOlJOCY:En«r 

( i i ) MOLECULE TYPE: DNAtpaauc) 

( 1 i ) SEQUENCE DESCRIPnON: SEQ ID N024: 

cccagtcgga tatooaatcc agaagogaaa caaocactcg ataattaaaa acaoctgggg 

AGAAAACTGG OGAAACAAAG CATATATCCT CATCGCTCGA AATAAGAACA ACGCCTCTGG 
CATTGCCAAC CTOOCCAGCT TCCCCAAGAT CTGACTCCAO CCAGAAA 

( J ) IKPDRMATION FOR SEQ ID NO-^: 

( i ) SEQUENCE aURACIERttnCS: 
( A ) LENGTH: Ul bvc pui 
( B ) TYPE: Bixfec acid 
( C ) STRANDEDNESS: double 
(D)T0POLOGY:Uaev 

( i i ) MOLECULE TYPE: DNACtoxniO 

( E i ) SEQUENCE DESCRimOK: SEQ ID NO:23: 
GCCAGGGCGG ACCOTCTTTA TTCCTCTCCT CCCTCACAGG TCAGGAACCA GGTCTGGCAG 
CACCTOCAGT GGGCCCTAGT CATCTGTGGC AGCCAAGOTG AAOGCACTCA CCTTGTCGCC 
COTCCCTOAO TAOAACTTOT TCTOGAATTC C 



6 0 
] 2 0 

1 7 ^ 



6 0 
1 3 0 
1 6 7 



6 0 
1 2 0 
1 5 1 



( 2 ) INFORMATION PDA SEQ ID NOJ& 

( i ) SEQUENCE CHARACTERBTICS: 
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( A ) LCNOtU: L56 bas pkiis 
( B ) TYPE: eadek Kid 
( C ) STSANDEDNESS: doo^ 
( D)TOPOljOGY^fiK«r 

( i t ) MOLECULE TYPE: DNAUcDOmic) 

( t i )SEQUEKCE DSSOUFTIOX: SEQ ID KOOfi: 
AACTCTTTCA CACTCTOOTA TTTTTAOTTT AACAATATAT CTOTTGTOTC TTGCAAATTA 00 
OTTCATATCA ATTCATATTO AOCTOTCTCA TTCTTTTTTT AATCCTCATA TACAOTAOTA 120 
TTCAATTATA ACAATATATC CTAATACTTT TTAAAA )56 

( 3 )INPOKMAnONFORSEQIDNCh27; 

( I jSEQUEKCECHARACTEIUnTCS: 
( A)L£NGTH:UObiscptin 
( B )TY?E:imrHf Kid- 
( C >mANDCI>KE5S:dooble 
( X> )T0POC0OYtUne<r 

( 1 i ) MOLECULE rrPErDNAtccoomic) 

< s i ) SEQUEKCE DESOtSTTON: SEQ ID HO-J7: 
COATAACAAA G A A G CC CTO A 0 O 0 C T A CCG G CC C C G C C T GC CCTGCCTCTC ACTCCTGGGA 60 
CCCACCAGCC CGCACACOTT CAGAGGCCCA CTTCCTCTTC CTTACGTTGG TGACCATCTG J 20 

CTCCTCCTTC GCCGCTGGAC AOCCACAAAA 150 

( 2 ) DiTORMATlOK FOR SEQ ID NO*J& 

( 1 ) SEQUENCE CKARACZERmia: 
( A ) LEKCIK 212 bue pain 
( B ) TYPE: Pud ric add 
( C ] STMSDEDNESS: doubtc 
( D ) TDPOLOGT BDcar 

( i i ) MOLECULE TYI^- DMA Cteaomte) 

< « i ) SEQUEKCE DESCRIPTtON: SEQ ID KOOS: 

CCACTTCGAA CCGACTTOOT OTCCTATTTT TCAAGCAGAT CTGCTGATAC TGACATTCTC 60 

TCTTCAGTTT CCCCATTTGT TTOTOCTTCA AATOATCCTT CCTACTTTGC TTCTCTCCAC 120 

CCATGACCTT TTTCACTCTC GCCATCAAGG ACTTTCCTCA CACCTTGTGT ACTCTTAGGC ISO 

TAAGAGATCT OACTACACCC TOCCCCTOAC TC 2 12 

( 3 ) INFORMATION FOR SEQ ID NO-^ 

(. i ) SEQUENCE CHA&ACTCRJSTICS: 
( A ) l£NGm- 1S7 bttc pain 
1 B )TirF£: Dudbc Kkl 
( C ) STRANDEONESS: doobb 
( D }TOroiJOGYilbcir 

( 1 i ) MOLECULE TYPE: DNAtraomie} 

( 1 { ) SEQUENCE DESCRIPTION: SEQ ID tiO^. 
ATCCCTGGCT CTGOATAOTG CTTTTGTOTA GCAAATGCTC CCTCCTTAAG GTTATACGGC 60 
TCCCTCACTT TGCGAGTGTG CAAGTACTAC TTAACTGTCT CTCCTGCTTC CC.TGTCOTTA 120 
TCCTTTTCTO OTOATGTTOT OCTAACAATA AGAATAC 157 

( 2 ) INFORMATVN FOR SEQ ID NOtUc 



( I ) SEQUEKCE CHARACTERISnCS: 
( A )L£NCTH:t52buepan 
( B ) TYPE: ouckic Kid 
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( C ) SIKANDEDNESS: douMc 
( D ) TOPOLOOYi Gxsr , 

( 1 i )KOLECUL£nrP£:DNA(tniamic) 

( 1 I ) SEQUENCB DBSOUFnOhf: SEQ ID NO-JO: 
COCTCCOCAT CCCTCTCCTC CTCCATCCCC ATACATCACC ACCTCTAATC TTTACAAACG 60 
CTOCCACCCC OOCTCTCAAG CCAAGOOCCO TCCOTOCCAC OOTOOCTOTO AOTATTCCTC IJO 
COTTAOCTTT CCCATAACCT TOCAGTATCT CC 132 

C 2 ) INFORMAHON FOR SEQ ID NaJl: 

( i )SEQUEHCE CHARACTERISTICS: 
( A ) LENOTH 90 tejc pafai 
( B )TYFE:iackxKid 
( C )STVANDEDNES5: dcwUe 
( D )T0POLOCr.ltDe<r 

( i t )MOUCULE TYPE: DMA (ffDOmic) 

( a i ) SEQUENCE OESOUPTION: SEQ ID N(h31: 
CCAACTCCTA CCOCCATACA GACCCACACA OTGCCATCCC TGACAGACCA GACCGCTCCC 60 
CAATACTCTC CTAAAATAAA CATOAAOCAC 90 

( 3 ) INFORMATION FOR SEQ ID KOJX 

( i ) SEQliENCE CHARACTERiniCS: 

( B )TYFE:fiyckie«dd 

( C ) STRANDEDNESS: tfoubte 

( D ) TOPOLOGY: fiiicir 

( i i } MOLECULE TYPE: DNACseoomk) 

(It) SEQUENCE DESCRIPTION: SEQ ID MO JZ: 

CATOCATCAA TCTCTCATCG TCCCAAGCAA CATOOTACAT TTC 43 

( 7 ) INFORMAnON FOR SEQ ID N0-J3: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 2333 biK ptin 
( B ) TYFE: mskic leid 
( C ) STRANDEDNESS: doeblc 
( D )T0POLOGY:laMr 

( t t 1 MOLECULE m^DNACccMxmc) ' 

( I i ) SEQUENCE DESCRIPnON: SEQ ID N0J3: 

ACACACCTCT GCCCTCACCA TGAOCCTCTC GCAGCCCCTC OTCCTOOTOC TCCTOCTOCT 60 

OGOCTGCTGC TTTGCTCCCC CCACACAOCO CCAGTCCACC CTTOTOCTCT TCCCTGGAGA 120 

CCTCAQAACC AATCTCACCG ACAGGCAOCT OGCAOAOOAA TACCTOTACC OCTATOOTTA ttO 

CACTCCOCTG GCAGaGATOC GTOOAOAOTC OAAATCTCTC CGGCCTCCGC TOCTCCTTCT 240 

CCACAAGCAA CTOTCCCTOC CCOACACCGO TOAOCTGOAT AOCOCCACGC TOAAGGCCAT 300 

GCCAACCCCA COCTOCGGGG TCCCAGACCT GGOCAGATTC CAAACCTTTC AGGCCGACCT 360 

CAAOTCGCAC CACCACAACA TCACCTATTO OATCCAAAAC TACTCOGAAG ACTTCCCCCC 420 

CGCGGTGATT OACGACGCCT TTOCCCGCGC CTTCGCACTG TGGAGCGCGG TGACCCCCCT 410. 

CACCTTCACT COCGTGTACA GCCGCOACOC ACACATCGTC ATCCAGTTTG GTGTCCCGGA 540 

GCACGCACAC GOGTATCCCT TCGACGGGAA GGACGGGCTC CTCOCACACC CCTTTCCTCC 600 

TOOCCCCOGC ATTCAGCCAG ACOCCCATTT CGACGATCAC CAGTTGTOOT CCCTCCGCAA 660 
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- COCCOTCCTC 


GTTCCAACTC 


OCTTTGCAAA 


CCCaCATGGC 


GCGGCCTGCC 


ACTTCCCCTT 


7 2 0 


CATCTTCGAC 


GGCCOCTCCT 


ACTCTGCCTG 


CACCACCGAC 


GGTCGCTCCG 


ACGGGTTGCC 


.7 10 


CTGCTGCAOT 


ACCACGOCCA 


ACTACGACAC 


CCACGACCGO 


TTTCGCTTCT 


GCCCCAOCOA 


14 0 


gacactctac 


ACCCOOOACG 


OCAATOCTOA 


TOGOAAACCC 


TOCCAOTTTC 


CATTCATCTT 


9 0 0 


CCAACOCCAA 


TCCTACTCCG 


CCTOCACCAC 


CGACGGTCCC 


TCCGACCGCT 


ACCGCTGGTG 


9 6 0 


cgccaccacc 


CCCAACTACG 


ACCCGOACAA 


OCTCTTCGGC 


TTCTGCCCGA 


CCCGAGCTOA 


10 2 0 


CTCGACOOTG 


atooooooca 


ACTCGGCGGG 


goaoctgtgc 


GTCTTCCCCT 


TCACTTTCCT 


10 10 


GGGTAAGGAG 


tactccacct 


OTACCAGCGA 


GGGCCGCOCA 


GATGOCCGCC 


TCTOOTGCGC 


114 0 


taccacctcg 


AACTTTCACA 


CCOAC AACAA 


OTOCGOCTTC 


TGCCCCOACC 


AAOOATACAG 


12 00 


TTTGTTCCTC 


CTCGCOOCGC 


ATGAGTTCGG 


.CCACGCOCTG 


GOCTTAGATC 


ATTCCTCACT 


1 2 «0 


OCCGGAGGCC 


CTCATCTACC 


CTATGTACCG 


CTTCACTGAC 


CGGCCCCCCT 


TCCATAAOGA 


13 3 0 


CCACGTO AAT 


OOCATCCGCC 


ACCTCT ATCC 


TCCTCGCCCT 


CAACCTOAGC 


CACOGCCTCC 


13 10 


AACCACCACC 


ACACCCCACC 


CCACGCCTCC 


CCCCACGOTC 


TCCCCCACCC 


GACCCCCCAC 


14 4 0 


TGTCCACCCC 


TCAGAGCCCC 


CCACAOCTCC 


CCCCACAGGT 


CCCCCCTCAG 


CTGOCCCCAC 


13 0 0 


AGOTCCCCCC 


ACTGCTGGCC 


CTTCTACGGC 


CACTACTOTG 


CCTTTOAGTC 


CGGTGGACGA 


13 6 0 


TGCCTCCAAC 


CTOAACATCT 


TCGACOCCAT 


CCCGGAGATT 


GGGAACCACC 


TOTATTTGTT 


16 3 0 


CAAOOATCGC 


AAOTACTGGC 


GATTCTCTOA 


GGGCAOGGGG 


AGCCOGCCGC 


ACGGCCCCTT 


16 10 


CCTTATCOCC 


GACAACTGGC 


CCOCGCTGCC 


CCGCAAOCTO 


CACTCCCTCT 


TTGAGOAGCC 


17 40 


GCTCTCCAAG 


AAGCTTTTCT 


TCTTCTCTGC 


GCGCCAGGTC 


TCGQTCTACA 


CAGGCGCGTC 


1 B 0 0 


GCTOCTGOCC 


CCOAOCCGTC 


TCGACAAGCT 


GCGCCTGGGA 


CCCG ACC7CG 


CC C A GC TG A C 


1 B 6 0 


CCGGGCCCTC 


CGGAGTCGCA 


GGGGGAAGAT 


GCTGCTGTTC 


AGCGCGCGGC 


GCCTCTGGAC 


19 3 0 


GTTCOACCTC 


AAGCCGCAGA 


TCCTGCATCC 


CCCCAGCGCC 


AOCOAOGTGG 


ACCGCATOTT 


19 10 


CCCCCGCCTG 


CCTTTCOACA 


CCCACOACGT 


CTTCCAGTAC 


CGAGAGAAAG 


CCTATTTCTG 


3 0 4 0 


CCAGGACCCC 


TTCTACTCCC 


OCGTOAGTTC 


CCGGAGTGAC 


TTGAACCACG 


TOGACCAAOT 


''3 100 


CGGCTACGTC 


ACCTATOACA 


TCCTGCACTO 


CCCTGACGAC 


TAGOOCTCCC 


OTCCTGCTTT 


2 16 0 


CCAGTGCCAT 


OTAAATCCCC 


ACTCCCACCA 


ACCCTCCGOA 


AOGAGCCAGT 


TTGCCGGATA 


2 3 3 0 


CAA ACTGCTA 


TTCTCTTCTG 


GAGGAAAGCC 


AGGAOTGO AC 


CTGGOCTGGC 


CCCTCTCTTC 


2 2 B 0 


TC ACCTTTOT 


TTTTTGTTCO 


AGTCTTTCTA 


ATAAACTTGG 


ATTCTCTAAC 


CTTT 


2 3 3 4 



( 2 ) INFORMATION POU SEQ ID NO:34: 

{ i ) SEQUENCE CHARACTEUma: 
( A ) LENOni: II mtm todt 
( B ) TYPE: laiao Kid 
( C ) 5T1UNDEDNESS: itetle 
( D ) TOPOLOOY: ntksowo 

( i i ) MOlfOaETypE: peptide 

( I I I SEQUENCE DESCSUFnGN: SEQ ID KO:34: 

Glo All Lc« Met Tyr Pro Mel Tyi Ar| p k e Tb r GIb Cly Pro Pro Lcd 

I 3 10 . 15 

H I s Ly t 

Wc claim: a) DMA sequences set forth in the group consisting of 
1. An isolated osteoclast-spedfic or -related DNA ccnrnMnQ i-j lij i-i-^h-t ^.u • 
sequence, or its complcinenii^ sequence, the DNA 65 SEQIDNOS. 12. 14, 16 and n.criheircomplcmen- 
sequence comprising a nucleic acid sequence selected £rom ^ strands; and 
the group consisting of: 



5^52, 

33 

b) DNA sequences wfaicfa hybridize under standard con* 
didom to the DNA sequences defined in a). 

2. A DNA construct capable of r^Ucating. in a host celt, 
osteodast-spectfic or -related DNA. said construct compris- 
ing: 3 

a) a DNA sequence of claim 1; and 

b) sequences, in addition to said DNA sequence, neces- 
sary for transforming or transfecting a host cell, and for 
replicating, in a host cell, said DNA sequence. 

3. A DNA construa capable or replicating and expressing, 
in a host cell, ostcoclast-specific or -related DNA. said 
construct comprising: 



34 

a) a DNA sequence of claim 2; and 

b) sequences, in addition to said DNA sequence, neces- 
sary for transfonning or transfecting a host cdl. arid for 
replicating and expressing, in a host cell, said DNA 
sequence. 

4. A cell stably transformed or transfected with a DNA 
constiuct according to claim 3. 

5. A cell stably transformed or transfected with a DNA 
constiuct according to claim 4. 

« * * « « 
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1 CCCAQQGCGC CCTAQQCQGT QCATCCCGTT CGCQCCTQQG QCTCTQCTCT 
51 TCCCQCGCCT'GAGQCQQCGG CQQCAQGAQC TGAQGQGACT TSTAGQGMC 
101 TSAQQQGAQC TGCTGTCTCC CCCQCCTCGT CCTCCCCATT TGCQCQCTCC 
151 CQGGACCATG TCCGCGCTQG CGGGTGAAGA TCTHHSAGG TGTCCAQQCT 
201 GTBQQGACGA CATTQCTIXA /U3IXAGATAT QCTAEAGGAC TGTCAAC6AA 
251 ACCTQQCAC6 GCTCtTQCTT CCQGTGAAAG IWKGCAG (XnOGACCAC 
301 CCCAATCTGC TCAAGTTCAT TQCTGTQCTG TACAAGGWTA AGAAQCTBAA 
351 CCreCT^CA GAGTACATra AGQQGQGCAC ACTSAAQGW: TTTCTQCGCA 
401 CTATQGftTCC GTTCCCCTGG CAQCAGAAfiG "TCAfiGnTGC CAAAfiGAATC 
451 QCaCCQGAA TGGACAAGAC TUreGTGGTC QCAGACTTTS GQCTCTCACG 
501 QCTCATAGTC GAAGAGAGGA AAAGGQCCCC CATGGAGAAG GCCACCACCA 
551 AGAAACQCAC CTTQCQCAAG AACGACCGCA AGAAQCQCTA CACGGTBGriS 
601 QGAAACCeCT ACTQGATQQC CCCTGAGATC CTGA ACGGAA AGAQCTATGA 
651 TSftGACQGre GATATCTTCT CCTTTGOGAT CGTTCTCTCT GAGATCATTS 
701 QQCAQGTGTA TBCAGATCCT GAC7GCCTTC CCCGAACACT GGACTTT BGC 
751 CTCAACGTGA AQCTTTTCrG GGAGAAGFTT GTTCCCACAG ATTGTCCCCC 
801 GQCCTTCTTC CGQGTGQCXG CCATCTCCTG CAGACTGGAG CCTGAGAGCA 
851 GACCAQCA1T CTCGAAAT7G GAQGACTCCT 7TGAQQCCCT CTCCCTCTAC 
901 CTBQGQGAGC TBQQCATCCC GCTGCCTCCA GAGCTCGAQG AGTTGGACCA 
951 CACTCTCAGC ATQCAGTACG GCCTGACCCG QGACTCACCT CCCTAQCCCT 
1001 GQCCCAQCCC CCTBCAQQGG GGreTTCTAC AQCCAGCATT QCCCCTCTGT 
1051 QCCCCAnCC TQCTCTGAGC AGOGCCGPCC GeQCTTCCTG TQGATraOCG 
1101 6AATGTTTAG AAGCAGAACA AACCATTCCT AnACCTCCC CAQGAQQCAA 
1151 GTOOQCGCAG CACCAQGGAA ATGTATCTCC ACAQSTTCTG GtSQCCTAGIT 
1201 ACTmcrGTA AATCCAATAC TTQCCTGAAA GCTGTGAAGA AGAAAAAAAC 
1251 CCCTQQCCTT TBGQCCAQGA QGAATCTGTT ACTCGAATCC ACCCAGGAAC 
1301 TCCCTGQCAG TGGATTGTGG GAQGCTCTTG CTTACACTAA TCAQCGHGAC 
1351 CTGGACCTQC TQQQCAGGAT CCCAGQGTSA ACCTOCCTCT 6AACTCTGAA 
1401 SrCACTAGTC CAQCTGGGTG CAQGAQGACT TCAAGTGreT 6GAC6 AAAGA 
1451 AAGACTGAT6 QCTCAAAQG6 TGTCAAAAAG TCAGTGATOC TCC CCCTTH : 
1501 TACTCCAGAT CCTGrCCTTC CTGGAQCAAG GOGAGOGAG TAGGrTTTCA 
1551 AGAGrcCCTT.AATATGTGGr QGAACAQQCC AQGAGTTAGA GAAAQQQCTG 
1601 GCIICIGIII ACCTOCTCAC TG GCTCTA6C CAGCCCAQQG ACCACATCAA 
1651 TGmGAGAQGA AQCCTCCACC TCAI GIIIIl AAACTTAATA CTQGAGACTC 
1701 QCTGAGAACT TACQGACAAC ATCCI ITCTG TCTCAAACAA ACAGTCACAA 
1751 QCACAQGAAG AGQCTGQQQG ACTAGAAAGA QQCCC7QCCC TCTAGAAAQC 
1801 TCAGATCTTC QCTTUTGTTA CTCATACTCG QGIGQQCTCC TTAGTCAGAT 
1851 GCCTAAAACA TTTTBCCTAA AGCTCGATQG GTrCTOGAQG ACAfiFGTOQC 
1901 TUGJCACAQG CCTAGAGTtT GAQGGAQBQG AGTQQGAGTC TCAQCAATCT 
1951 ClTGGTCTre QCTTCATBQC AACCACTCCT CACCCTTCAA CATQCCTGGT 
2001 TTAGQCAGCA QCTTQQQCTC QGAAGAGGTG GrOQCAGAGT CTCAAAQCTG 
2051 AGATGCTGAG AGAGATAQCT CCaGAQCTG GQCCATCTCA CTTCTACCTC 
2101 CCATCTTTQC TCTCCCAACT CATrAQCTCC TQQQCAGCAT CCTCCrMQC 
2151 CACATCTQCA QGTACTBGAA AACCTCCATC nOQCTCCCA GAQCTCTAGG 
2201 AACTCTTCAT CACAACTAGA TTTQCCTCTr CTAAGTGTCT ATGAQCTTQC 
2251 ACCATATTTA ATAAATTQGG AATQQGnTG GGGTATTAAA AAAAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA (SEQ ID N0:1) 

FIG.1A 
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FEATURES: 
5*inR: 
Start Codon: 
Stop Codon: 
3'inR: 



1-228 
229 
994 
997 



Homologous proteins: 

J n p in Bt AST Hits 



CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 



1000682328847 /altid=gi 1 8051618 /def^f^f |NP 057^.1| LIM d 



18000005015874 /altid=gi 
88000001156379 /altid=gi 
88000001156378 /alt1*=gi 
18000005154371 /altid-gi 
18000005126937 /altid=gi 
18000005127186 /altid=gi 
18000005127185 /altid=gi 
18000005004416 /al ti *=gi 
18000005004415 /altid=gi 



5031869 /def=ref 
7434382 /dsf=vir 
7434381 /def=pir 
7428032 /def=pir 
6754550 /def=ref 
2804562 /def=db j 
2804553 /del^j 
2143830 ydefH)ir 



IP 005560.11 LIM 
jC5814LIM motif... 
JC5813 LIM motif... 
JE0240 LIH kinas... 
NP 034848.11 UM ... 
BAA24491.il (ABOO... 
BAA24489.lj (ABOO... 
I7B847 LIM motif 



1708825 /def^sp|P53670|LIK2_RAT LI. 



pi AST dbKT hits: 



gi 
gi 
gi 
gi 
gi 
gi 
gi 



10950740 /dataset?<lbest /taxorp96. . . 
10156485 /datase1?=<Jbest /taxor»=96. . . 
54216*7 /dataset=dbest /taxon=5606 . 
10895718 /dataset=<Ibest /taxon=96. . . 
13043102 /dataset=dbest /taxon=960. . 
519615 /datase1?=dbest /taxon=6606 /. 
11002869 /dataset=<lbest /taxorF96. . . 



Score 
-485 
485 
469 
469 
469 
469 
469 
469 
468 
468 



Score 
1049 
975 
952 
757 
714 
531 
511 



E 

e-136 
e-136 
e-131 
e-131 
e-131 
e-131 
e-131 
e-131 
e-131 
e-131 



E 
0.0 
0.0 
0.0 
0.0 
0.0 
e-149 
e-143 



EXPRESSION INFORMATION FOR MODUUTORY USE: 
library source: 
From BLAST d bKT hits: 



gi 
gi 
gi 
gi 
gi 
gi 



10950740 
10156485 
5421647 
10895718 
13043102 
519615 
11002869 



teratocarcinoma 
ovary 
testis 

nervous_nonnal 
bladder 
infant brain 
thyroid gland 



From tissue crfppnina panels: 
Fetal vtole brain 



FIG.1B 
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1 MVQDCQRNLA RLLLPVKVMR SLDHPNVLKF IGVLYKDKKL NLLTEYIEQG 
51 TLKDFLRSMD PFPWQQKVRF AKGIASGMDK TVWADFGLS RLIVEERKRA 
101 PMEKAT7KKR TLRKNDRKKR mVGNPYWM APEMLNGKSY DETVDIFSFG 
151 IVLCEIIQQV YADPDCU'RT LDFGUflflCLF WEKFVimP PAFFPL^^ 
201 CRLEPESRPA FSKLEDSFEA LSLYLGEL6I PLPAELEEID HTVSMQYGLt 
251 RDSPP (SEQ ID N0:2) 



FEATURES: 

Functional domains and key regions: - . 

[1] PDOC00004 PS00004 CAMP_PHOSPHO_SITE 

cAMP- and cGMP-dependent protein kinase phosphorylation site , ■ ■ 

Nuiii)er. of matches:? 2 

1 K 108-111 :KKRT . .- ' ""'.'y 

2 ■-; ■;ii?-i22";i(RyT -ll .. '.' ' ^•^'V; ;:T;"^:-; : 

[2] POOC00005 PS00005 PKC RHOSPHOjlTE 
Protein kinase C phosfrfioryTatioh site 

Umber of matches: '4 
1 51- 53 TLK 

: '2 '•■ .io6-io8.m ', ■ ■'■■■S . .-v/ ' 

3 107-109 m ::• V 

4 111-113 TLR 



[3] PD0C0G006 PS00006 CK2_PH0SPH0_SITE 
Casein kinase II phosphorylation site 

NuntfDer of matches: 4 

1 51-54 TLKD 

2 76-79 SGMD 

3 139-142 SYDE 

4 212-215 SKLE 



[4] PDOC00008 PS00008 MYRISITL 
N-myristoylation site 

Number of matches: 4 " 
1 73-78 GIASGM 

FIG.2A 
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2 77-82 GMDKTV 
: .3 .150^155 GIVLCE; . 
4 158-163 QQVYAD 
Mpmbrane sp anning strurtvr*' <ioniains; 

Helix Begin End Score Certainty 
1. 142 162 0.872 Putative 
2 184 204 0.652 Putative 

BLAST Alignment to Top Hit: , 
>CRA|1000G82328847 /altid=gi 18051618 /def^ef INPJ57952.il 

' domain kinase 2 Isofonn 2b CHomo sapiens] /org-Homo 

sapiens /1axorF9606 /datasetHjraa /lengtfF617 . 

Length = 617 

Score - 485 bits (1235). Expect - e- 136 _ ra„c - 99/J>fiR TRl^ 

Identities = 241/265.(901). Positives = 241/265 (901) . Gaps - 22/265 (W) 

Ouerv 13 LLPVkVMRSlI)HPNVlJ(FimYI^^ 

Query. 13 [i^^j^jjgjg^f JJ^^JgyLY^ : 

Sbjct: 353 LTeSShpK^^ 

n..prv 73 GIAS9i----"-------"------DiaVWADFGLS^^ 

Query. . . oiOVWADFGLSRLIVEERKRAPMEKA-nKKR 

Sbjct: 413 SASaiAYlJ1SMCIIHRDUJS1NaiKlJ)iaVWADF^ 472 

Ouerv 111 TlJ«NDRmrTWGNPYVWE^lNGKSYDE7VDIRFGIVLCEIIQQV^^ 170 
Query, ni ihJJJjS^^l^gjjpYy^ 

Sbjct: 473.SK!nVV^ 

niierv. 171 LDFGLIMIJlCKFVPTTXyPAFFPLAAICCR^ 230 

SfSwSKtdcppaffplmicc^ 

Sbjct: 533 SfgSvKIJ^^ 

Query: 231 PLP/\ELEEIJDHTVSMQYGLTO)SPP 255 

PLPAELEELDHTVSMQYaTTOSPP 
Sbjct: 593 PLPAELEELOmVSMQYGLTRDSPP 617 (SEQ ID N0:4) 

Hnrer search results (Pfani):. ^ ^rore E-value N 

f<nriAl Description _ — fgCf ^ ^f'"! % 

PF00069 Eukaryotic protein kinase domain 100.1 l.le-Z5 ^ 

CE00031 CE00031 VEGre Z't i i 

CE00204 CE00204 FIBR08LASr_GR0Wm_RECEPTOR 4.7 1 1 

CE00359 E00359 bonejnorphogeneticprotein_receptor r.B /.» a 

CE00022 CE00022 HAQUK_SUbfanrily_d^ J-J ^ rjkl \ 

CE00287 CE00287 PTK_Eph_orphan_receptor -48.4 ^-^e-ua J 

CE00292 CE00292 PTK inenibrane_span 'O^-o ^-le-ua 

FIG.2B 
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CE00291 CE00291 P7K_fgf receptor 

CE00286 E00286 PTK_EGF_receptor 

CE00290 CE00290 PTKTrk_family 

CE00288 CE0()288 PTK Insulin_receptor 



■113.0 
-125.1 
•151.3 
-210.4 



Parsed for domains: 



seq-t Imni-f hmm-t score E-va1ue 



PF00069 


1/2 


16 


79 .. 


41 


105 .. 


52.1 


CE00022 


1/1 


124 


153 .. 


187 


216 .. 


1.5 


PF00069 


2/2 


81 


156 


129 


182 .. 


.48.0 


CE0003i 


1/1 


t29 


156 ... 


1114 


1141 . . 


4.9 


CE00204 


1/1 


129 


156 .. 


705 


732 .. 


4.7 


CE00359 


. 1/1 


79 


157 .. 


287 


356 .. 


1.8 


CE00290 


1/1 


9 


218 .. 


1 


282 [] 


•151.3 


CE00287 


1/1 


1 


218 [. 


1 


260 [] 


-48.4 


CE00291 


1/1 


1 


218 [. 


1 


285 [] 


-113.0 


CE00292 


1/1 


1 


218 [. 


1 


288 [] 


-61.8 


CE00288 


1/1 


1 


218 [. 


1 


269 []- 


-210.4 


CE00286 


1/1 


6 


218 .. 


1 


263 [] 


-125.1 



2.3e-13 
2.5 
3.1e-li2, 
0.14 
1 

7.9 
6.5e-05 
3.8e-05 
0.027 
2.1e-05 
0.014 
0.0021 



FIG.2C 



0.027 
0.0021 
6.5e-05 
0.014 
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1 TCATCCTTGC GCAGG6GCCA TGCTMCCTT CTGT GTCTCA GTCCAATTTT 
51 AAT6TATGT6 CTGCTGAAGC GAGAGTACCA GAGGTmTT T6ATGGCAGT 
101 GACHGAACT TATTTAAAAG ATAAGGAGGA GCCAGTGAGG GAGAGGGGTG 
151 CTGTAAAGAT AACTAAAA6T GCACnCTO TAAGAAGTAA GATGGAATGG 
Zdl GATCCAGAAC AGGGGTGTCA TACC6AGTAG eCCAGCCTTT GnCCGTQGA 
251 CACTGGGGAG TGTAACCCAG AGCT6AGATA GCTTGCAGTG TGGATGAQCC 
301 AGCTGAGTAC AGCAGATAGG GAAAAGAAGC CAAAAATCT6 AAGTAGGQCT 
351 GGQGTGAAGG ACAGGGAAGG GCTAGAGAGA CATTTGGAAA GTGAAACCAG 
401 GTQGATAT6A GAGGAGAGAG TAGAGGGTCT TGATTTCGG6 TCTTTCATGC 
451 HAACCCAAA GCAGGTACTA AAGTATGTGT TGATTCAATG TCTTTGGGTT 
501 TCTCAA6ACT GGAGAAAGCA 6GGCAAGCTC TGGAGGGTAT GGCAATAACA 
551 AGnATCTTG AATATCCTCA T6GTG6AAAG TCCTGATCCT GTTTGAATTT 
601 TGGAAATAGA AATCATTCAG AGCCAAGAGA TTGAATTGTT GAGTAAGTGG 
651 GTGGTCAGGT TACAGACTTA ATTTTG6GTT AAAAAGTAAA AACAAGAAAC 
701 AAGGTGTGGC TCTAAAATAA TGAGATGTGC TGQGGGTGQG GCATSGCAGC 
751 TCATAAACTG ACCCTCAAAG CTCTTACATG TAAGASTTCC AAAAATATTT 
801 CCAAAACTTG GAAGATTCAT TTGGATGTTT GTGTTCATTA AAATCTCTCA 
851 CTAATTCATT GTCTTGTCCA CTGTCCGTAA CCCAACCTG6 GATTGGnTG 
901 AGTGAGTCTC TCAGACTTTC TGCCTTQGAG TTTGTGA^G AGATGGCATA 
951 CTCTGTGACC ACTGTCACCC TAAAACCAAA AAGGCCCCTC TTGACMGGA 
001 GTCT6AGGAT TTTAGACCCA GGAAGAATGA GTGATGGGCA TATATATATC 
051 CTATTACTGA GGCATGAGAA GAGTGGAATG GGTGGGHGA GGTGGTGnT 
101 TAAGGCCTCT TGCCAGCTTG TrTAACTCTT CTCTGGGGAA C6AGGGGGAC 
151 AACTGTGTAC ATT6GCTGCT CCAGAATGAT GTTGAQCAAT CnGAAGTGC 
201 GAG6AGCTGT GCnTGTCTA TTCATGGCCC CTSTGCCTGT GAAACAGGGT 
251 leGGTGACTG TCACT6TGCC TGTGGCAGTC TGtAGrrACC CAGAGAGAAC 
301 AAAGCTGCAT ACACAGAGCG CACAAGGGAG TCTTGTMCA ACCTTCTCCT 
351 GCTTTCTAGG QCTGAGTCAG GTACCACAGC TTGATCTCAG CTGTCCTCTT 
401 TATTTCAAGA AGTTGACATC TGAGCCATAC CAGGAGTATT GTATTTTG TT 
451 TGAGGCCTCT CTTnTGGAG GAACATGGAC CGACTCTGTG Cl 1 1 ibiCTA 
501 TGCTGGTCTC TGAGCTCACA CAACCCHCA CCCTCCTTTC TCAQCCACTG 
551 ATAGGTAA6T CHCCCTATC TTGCAAGGCT CAGCTCAAGT GTCAQCirCC 
601 TCTACAAAGA CTTTCCTGGT TCCCCTCAn GGAG7GAACA AGAGTrSACA 
651 TGGTAGAATG GAAAGAGCAG AAGCTTTAGA ATGAGCCAGA CCTGAGTATG 
701 AATGCTAGAT CCACCACTTA GCTAGTCAAC CCTGCCCCCT GCCTCAAGTT 
751 TTAATTTTCC TATCCATTAA GTGAATATAA TAATACCTGT GTCACAGGAT 
801 TATTTT6AGA ATTAAATGAG AHAGGTCTA TGAAAQCACC T AGCA6A 6TT 
851 CTTBGCATAT AGGAGGCATT CATTAAATAT TTGrrCTTCC CCTTTTATAC 
901 CCATTACTTT TCnTTTCTG AACTAAAATA ATACTTGGTT CTATCTCT GA 
951 AATAACATCC AA6TGAAAAA TCAACAACAT GAAAGAGCAG M ti i iiCGA 
2001 GTG6ATTTGC TTCTTAAlSGA GCAGAGATTA TGTAATCTAA CAGCCTCCAA 
2051 CATACAAA6A GCTTTGTATC TAGAACAGGG GTCCCCAGCC CCTGGACCGC 
2101 CAACTGGTAC GGGTCTGTA6 CCTGTTAGGA ACCAGGCTGC ACAfiCAGGAG 

2151 6TGAGCGGCG GGCCAGTGAG CATTGCTGCC TGAGCJCTGC CTCCTCTCAG 
2201 ATCAGTGGTG GCATTAGATT CTCATAGGAG TGTGAACCCT AnGTGAACT 
2251 GCACATGCAA GQGATCTGGG TTGCATGCTC CTTAT6A6AA TaCACTAAT 
2301 GGCTGATGAT CT6AGTTGGA ACAGITTGAT ACCAAAACCA TCCCCCCGCC 
2351 CCCCAACCCC CAGCCTA6GG TCCGTG6AAA AAnGGCCCC TQCTGCCAAA 
2401 AAQGTTGAGG ACTGCTGATC TAGAG6ACCA ATTTATTCAA TGTTGGTTGA 
2451 6TAAAT6AGC TCTTGGAnA GGT6ATG6AA AAATCTGAAA AAACAGGGGT 
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2501 TTTGAGGAAT AGGAAAAGGC AGTAACATGT TTAACCCA6A GAGAAGTTTC 
2551 TGGCTGTTGG CTGGGAATAG TCATAGGAAG GGCTGACACT GAAAAGAAGG 
2601 AGATTGTBtT CGTTTCTTCT TCTCAGAGCT ATAAGCAAAG GCTGAAAGTT 
2651 CTAGAAAAAG GCAAGTTTTG TTTCAGTA6A AAAAAGGATA ATCAGAACCA 
2701 TTTTTA(^\AA ATGGAATGAG ACTACTTTTG AGQCCATGAG ITCCtTGTCC 
2751 CT(^GAGAT GAGCAGAGGT TGGACAAGTG XTTACCAGAG ATCTfGTGGA 
2801 GQCAGAAACT GTGCATCTA6 CAGAGCATTG GCCTMCCCt TTCAAATGAG 
2851 ATGCTGTTAA CTCAGTCnA TTCTACATGG T AGISAAT CCT GltCCTTTGC 
2901 CTCCTGCTAC 7TTGGGCCTC TCAACCTCTT GGTTTTGTGT GCAGiaTGAAG 
2951 ATGTCTG6AG GTCiTCCAGGC TGTGGG6ACC ACAHGCTCC AAGCCA6ATA 
3001 TGGTACAGGA CTGTCAACGA AACCTGGCAC GGCTCTTGCT TCCGGTAGGT 
3051 GQGCCTATCC TCCCATCTTT ACCAGTGTAC TATGGGCCAA iSCACTAtTTC 
3101 AT6TTCTGAT GGAAAACACA GAAACAAGCT TaGAGlTGA GAATTTCAAT 
3151 CHAGGGTGG GGAAAQGAAT GTACCAAGGA AGAGCTCATG ACCAAACCTC 
3201 AAGT6TGGCC CCCCTGAACC CAGGTTAAAT TG6AAGAQCC ATAAATGGGC 

3251 CAGCTGGAGG CAQGGTiSGQG GGATGAGAiGG AGCCCTTTCC AGGSTTGTCC 

3301 CATATCCCTC ACmATGGG TGAQGAAACT GAGGCCCAGG AAGAGTGACT 
3351 TTCCTGTGGC TQCACTACA6 ATTATGCAGG TACTTCAAGA UIIGII ICTA 

3401 nctTATnr ATnTATnr AnTTATnr ATnTArmr attttatgag 

3451 AdGGATTCTT GCTGTTGCCC AGGCTG6A6T GC AGTGGT 6C AATCTCGGCT 
3501 CACTGCAATC TCTGCCTGCT GGGTTCAAGT GATTTTTCTG CCT TAGCTT C 
3551 CTGAGTAGCT GAGATGACAG GCACCTGCCA CCATGCGCAG CTAATTTTTG 
3601 TAnTTAGTG GAGACGGGGG TTTCAACATG nGGTCAGQOTGGTCTTGAA 
3651 CTaiGACCT CAAATGATGC ACCCACCTCG ACCTC CCAAA GTGCTG6AAT 
3701 TACAGGCGTG'AACCACTGtG CCCAGCCAAG AGTTGTTTn AGTGTGSTTG 
3751 6CAGAGCCAG CTGnCCTTC ACCACAGGAT GCCTCCCTAG GTrCCTACTT 
3801 TTTGTtACTA GCTnTATTA TAGCTATATT ATTATTATTA tTATTATTAT 
3851 TATTATTAn ATTAHGAGA CAGAGTCTCG CTCTGTCGCC CAGQCTGGTG 
3901 TACAGTBGTG CGATCCCGGG CTCACTGCAA CCTCTGCCTC CCGAGTTCAA 
3951 GCAGTTCTCC TGCCTCAGCC CCCC GAGTAG 6TQGGACTAC AGGGGCCFGC 
4001 CACCACACCC GGCTAATTTT TCTATITTTA 6TAGAGACGG GGTrTCACCT 
4051 TGTTGACCAG GCTGGTCTGG AQCTCCTGAC CTC AGGTAAG TiGCTAGAATC 
4101 ACAGGCGTGA ACCACTGCGC CCAGCCAAGA GnGTTTTTA 6TGTGGTTGG 
4151 CAGAGCCAGC TCHCCTCAC CACAG6TTGC CTCCCTAGGT TCCTACTTTT 
4201 TGTTACTAGC TTTATTATAG CTACAnATT ATTATTAnG TTATTAtfAT 
4251 TGAGACAGAG TCTCGCTCTG TCGCCCAGGC TGGTGTACAG T6ATGT6ATC 
4301 TTGGCTCACT GCAACCTCTG CCCaCGAGT TCAAGCAAtT aCCTGCTTC 
4351 AGCCCCCCTA GTAGGTGGGA CTCCAGGCAC CTGCCACCAC GCCCAGCfAA 
4401 TTmGTAn TTtAGTAGAG GCGGGGTTTC ACGHGnGG CCAGGCTGGT 
4451 CTCAAACTCC TGACCTCAGG TGATCCGCCT GCCTCGGCa CCCAAAATGT 
4501 TOGGATTACA GGCATGAGCC ACCGCGCCCT GCCTATAGCT ACATTAtnT 
4551 TSTAGGCAGC TCAGmrCTT AAAAATTATA CAGACTTGAA ATCAGATITG 
4601 TTCCTGCTGT CTGAGGCTCA GmCTTCAT CTGGAAAATG jGATGGTAATA 
4651 ATCTTGTTGA GATTGAATGA AATAATATAT GCAGTGTATC CA6TACATGG 
4701 TAGACACCCA GTGAATGGTT ATTCCTTCCT CCCATCGGAT TGGAAlTtTC 
4751 AAGGGTGGGA ACTTGTCTTT ATATTCTTCA CAACGTAAAA TAGTT6AAAT 
4801 TTGTTGGTGG AAA6AAGAGC AGTCCACTCC AGAGGCTG6A TGGGCATGCC 
4851 TGGCCCCCAA GGTCTGAAGT GGTAGGGCTG TGCCTATATC CTGAGAATGA 
4901 GATAGACTAG GCAGGCACCT TGTGCTGTAG AHCCAGCTC CTGCACATAG 
4951 CTCnGTTGT AAAACATCCC TGTGOTATA CCAAGTAA7T GA6TTGACCT 
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5001 TTAAA(yVCT7 GCCTCrrCCC TGGGAACCAT ATAGGGGA7T QGCCTGGAGA 
?0M StctGGCCT CTGGAAGAGT TGGAAAGCAG CCATCAnAT TATCCTTTCC 
5?m mSS AACTCAGAGC TCTCAAGTCT TTTCTGTGGA TCTTATrGCC 
S?l ™fTCTTG CCCCTTTTAC TCCCAGGGAA GTTGATTCTG TCTTTTCTGT 
Sm S^ACT ATGACAQGAG CAGAGAATCT CAGAGCTGTA AGGGACCTTA 

loi IScAM ACCCAATG^ HCACAGATT GGGTCTCGCC ttggqatgta 
ACCCATATGT TCATAircn GCTGTTTrCC TATGTGTATG AATATTTTCT 

5401 Jtccmaa^ AGCAGGACAG GGTAGAGCAA gttaatcttt ggaatttctg 

^151 GATTCTCTTA SgCTAAAAA ACTTCAGAAC TAGAAGAAAC CACCCACTAT 
«m atrttataac CCATTCATAT CACAGATGAG GCCTGAAACC AAAAAGACTT 
TCGA^CAA GAQCTQGCCC TAGCACT6AA aCTTGGGTC 
l^] ™rCAGAr GCTAGCTTGT TAGCTCTGTG CGTGCGnTGTG 

^6?} Stgtc tgtgtctgtg tgtctgagat agagacagaa agataacata 

llll ^ScMA TACATAAAGA GGAAGTAGAC ACGrrAGCAT, ggtagataag 
S?} IrTArAGGCA GKCaSgT GGTGGCTCAC GCCTGTAATC CCAGCACTTT 
80 TCicCTGAGG TCAGGAATTC GAGAC^^C 

nil TG^CAACAT QCTGAAACCC CATCTCTACT AAATACAGAA AAAAATTAGC 
5901 TTffiCATGGT GGCACATCCC TGTAATCCCA GCTACTTGGG MGCTGAA^ 
5951 AGGAGAATCG CTTCAATCCG GGAAGCAGAA GHGCAGIGA GCC6JGATTG 
6001 TOCATTACA CTCTAQCCTG GGCAACAAGA GGGAAACTCC ATCGCJAAAA 

6101 TCCTGQCTTT GCAATTTATT AACTAGCCTT MGTGACTTC CCT6AKTTC 
ARGCACCAAT CTSTAAAATG AGGATAAGAA TATTACTCAT GCCACAT6GT 

fi?m T^aS 6ATTAAAT6T GATAACC^^^^ 

IsItcT^S AGAAAACTCT TAATA^^^^ 
SKJS ffiG^GGCC^^ 

fi-^m RAGfrCAGGA GTTT6AGACC AGCCTGGCCA ACATGGCAAA ACTCCACCIt 

6401 ^Saaaw ACAAAAATAT TAGCCAGGCG TGATGGCACA cacctctagt 

6451 CCCAQCTACT TGGG^ 

fiSl TrAAGGCTCT AGTGAOTGT GATCATGCCA CTGTACTCCA TCCAGCTGQG 

mJ™ ctcamacaa aacaaatgaa 

6501 CTTAATAATC AGTAACTGTC ACTTTATATT ^JIIS^l ??S^lScT 
fifini TATAfACCTA TATGTATACA TTrCTCTTAT TACACATTCA TTGGTeATU 
15 JaIS SStT AAGGOAACT JM^^ 

'ZS^ ^^^^ SSK 

7ftci TTR&RfiAArA TGGGGCATCA ACCTGAATGG TCTTGTAAGA TCTCTCCCAC 
7?ol ScAOTTGC CACTCTTTCT CTGATGAATT TAGAGTACCT GAGTAGTQCA 
7}?} M^TCCT^ GAGSAGGACT CTCCCTCT6T GCTACTCAGA GAAATTCATT 
72m cTTcSc cccttcSgc CTTGCTCTTA CCCAQCTGGG CTACAGTTAC 
11^ AATAMGGAA AT6ACTTTTC TTCTCCCCH CCCCCAGTAC CTTTGTTTTC 

ml SSiG gatattgaat ggagaaattc ctgg^oca 

7^i;i TrnntLrJC rJCCCCTCAT CTCTCCCTTA CATTACCCCA TTCTTCIblt 
7451 GGTGCCCAGG ACMCAGGAA GCTACTTAAA GCTGGAACCT CAGACTGTGC 
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7501 MTG6A6GCC AGTGACAAAA CTGAA AGTAG CTCTGTCAGT AATTCTGCTG 
7551 GTGCGATTAG GCAGCTGGCC AGAATCTTTT 6GA TCTCCTG GAC ATATGGC 
7601 TGACTAGTCC TCCCAAGCCT TCCCAACAGG CC TCi 1 1 1 1 1 TTCCI M I M 
7651 TCTTrrCTTT tTTTTCTrTC TfTCTTTCTT TCI II III II II IIIIIIAG 
7701 GCTAGTGAAG T6AAATTGTG GGAGTGGAAA AGGAACAAAG AAATCGGTAA' 
7751 CTGGTAGTGA TCAAmCTT GTAAACACTA TtGTACITQG ACCAQC^ 
7801 TAGGCCTTTT TTAAAACTCT GAGTTACCTC TCTTTCCTTT CCTTGAGCA6 
7851 TGCCAHAAT TCTGTATCTG GGGCAATCCT TTCTGAT6TT CTCTGGACCT 
7901 GGCTCTCTCT CC7TAGGAGA GGCCAGGAGA GTAGCCAGAG AGCAT6TCAT 
7951 TTGTAGCTGA GGHAAAGIG TG6AQCTATC A ATGGTG ACC TGGCQTCTTG 
8001 GCATGHAGC AAGCCAGAGG ACCHGACAA CTTTTTTGAT GATPGTCCGT 
8051 TCACCCTGAT CAAAGGTCn IGGCHAGGA GGAGGGAA6A AAAGCTACCC 
8101 CTATTAGTCT TGATGGCCCC AGCGTGGGTC TCTATTGCTT GACCTGGTTC 
8151 CTAGCAGCAT TATCAGAAGG AAMTXX:ACC GCTCHAAGG CTCCTGQG^ 
8201 CTTTCAGGAC TTCCTTTCTC AGGATTGCAA ACATAAGACT ATTTGAQCn 
8251 TCACmTGA AAAGCGGTTA CTAATACCTA TACTCTGGGA MiGGGCTAAT 
8301 GCAGATAGAA 6ACTGTGGTC ACTGCATCAG GCAACAGACC ATTtCCQCTA 
8351 AATTrAGTGA CTCCAGGAAG GCCAGTGAAG AAATAACACA C6TAGCAACC 
8401 AGA6ACTGTG TTGTAATAT6 TTGGCT6ACA GCAGGGTACT TTCTGTGATG 
8451 CTGAAAGCCA CATTCATTTT CTCTCCCCTC ATCCCCATCT AAGCAAGCCT 
8501 GGTAGAATCA TAATTACAGT AATAGGTACC ACHATTGAG tACTGTGTGC 
8551 CAGACACCCT CCTGAGCATA CGACATGCAT AGCACATTTA AtCCTTACAA 
8601 TGACTTAATA AAATGTAGTA CTAGTCTTAC CTACHCGAG AATAGGGAAA 
8651 TGGAGGTTAC TTGTTTAAAG TCACAGAGCT AATAGGTAGC ATAGCTGAGA 
8701 TTTGAACTCA GGCATTCnA CTCCTTGCCT GCA^ 
8751 HGAATGCAA GCATATTTCT TAACCTCACT 6AGGCTCAGT TTCCTCTTAT 
8801 ATAATATGGG GTAAAGAGCC CTCACCCTGC CtGCCACACA CTGGTAGTGT 
8851 CAGATAACAT TGAAQGGT6T TAGITTAAAG GCTTCAT66A CTCTATAATG 
8901 TCAACAAAAG TGCTGHAAC TTTCTTCTGG GTCTCAGGCT CCTGATGTAG 
8951 AGTCAGTGGA GCAACCCTGC CATCTGCTGT TATGCTGHG ATGTTGCTGC 
9001 CACACTTACT AACCTAAACC TTTGAnCTG GCTGTGGCCT TCTCCAGAAG 
9051 GTGTTTACTC AITTGTCCAG TTTATCTnT AGGAAACAGC CAGCCCGTAG . 
9101 ATCATTAAQG CTGGCTAnG GACAGGGGGC TGGGGCCTGC CT6ACA6AGG 
9151 AAG6AAGGGC AGACATCTGG TTCnCCTCT QCCCCTACAA 6AGACTCCAG 
9201 CCTGACCACA GAGTGGTACT CCTAQGATGT AGCA6CAGCA TATGAGCTTG 
9251 AATGTGCCTT AATCCTGCTC TTrACTITGA GAAGAGAGAA CTAAGGACCC 
9301 ACAGATGnr CACAfiCTTCT ATAGGAGGCA GAGGTAGAAA AATGGAGAGA 
9351 GATGAGGCCA GAGATAGATA ACTGATATTA ATTAAACGTT GTAHAAGAA 
9401 CCTCACTTAG AHATCTGAT TCAATCTTCA TAATAACCCT GCAACCCCCA 
9451 CC IIIIHIG AGAACAGGGT CTTGCTCTGT TGTCCAQGCT ACAGTGCACT 
9501 GGTACAATCA TAGTTGACTG CAGTGTCAAC CTCCT6AGCT CAAGCAATCC 
9551 TCCCACCtCA GCCTtGCAAG CAGCnGGAC TACAGGCGTG CCACCACACC 
9601 nGCCATTTT TnTTATTTT AAGTAGAAAC AAGGTCHAT TAATACTATG 
9651 TTGCCCAGGC TGGTCTTGAA CTCCAGC6AT CCTCCTCCCC CAGCCTCCCA 
9701 AAGTGC7TGG 6ATTACGGAA GTAAGCCACT CTGCCTGGCC AGTGCAACCC 
9751 CCATTTTATA CTAAAACAGG AAGGCCCAGA AAGGTTTGGA GTAACTTGTC 
9801 CAGQGTCACA CAGATGATAT TTGAACTCAG GTCTCCCTGG CTCCCAAGAG 
9851 AGTCTGCTTT CCACtAGGAC TCCCAQGAGA AAAAAAAAAA AAAAAACAGT 
9901 AGACTTGGAG ACAGAAAATC TGATTT6AGT CTTAGTTGAG CTAGGCTAAC 
9951 T6TGTAACTG TGGGCAAGTT CCTTAGCCCC TGTGAGCCTC AGTTTCTTAT 
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10001 CTGTAAAATG TCATAAAAGA MTCCATCTC ATGGACTAGT TGTGATGATC 
10051 MGGACTCTG AAAACAHAG MTGGnTAA TGT6AAGGAT TA^AG^GC 
10101 ACATGGCAAC AHCTGCATC TTATATTAAC TATCCAAATA TATCAAOGT 
10151 catttSSt ATATAAAAGT CATCAAATTA GGCACTGTQG gggatacgga 
lOMrmGoSTAC TAGCCTGGCC TCTTAATTAA TTCA™ 

10301 atgatSt ACTATAGCCT CAATCTCCCA GGCTTAAACA atcctcctga 
10351 GTAGCT^ CTACAGGCAC ACACTACCAT GCCCAGCTAA llllllljlA 
10401 ATTTTn^ 6A6ACAGG6T CnGCtCTGT TGCCCAGGCT GGTCTCAAAC 
5451 tSSctC dAGATCCTCC CACCTGQGCC TCACAAAGTG TTG6GATTAC 
10501 AGGT^TCAGC CACG6CACCT GGCCTGGTCT CTTAACTGCT TCCqTAAGAC 
10551 AGCTGGAAAT AGAGAATGTC ATGGAGCATTXCTMCCATC G^CCAgC 
10601 TGGCTTTCAT TCTGTTTCTC CCCTGAAACA ACATTCCTTT AGTAATATTC 
10651 CGAATAACAG CHCATCAGT CTGTCTACCG AOCACTCTTC AGGCTTaiC 
10701 TTATATGACC TCCCAAACTG CACTAAGGGT TGTATTAGAG AAAAGTGGAT 
10751 AAAGTTCGGA CTCAGGCTGC TTGAGCHAA ATGCCAGCTT CACTTACCAG 
10801 CCACCTtlACC ATGAGTCAGC TGCTTAACCA TTCTTTGCCA CAGTTTCCTT 
10851 GTCTATGAAA AGGGAAATGG CTCCCACCTC AAAAAGHGT TAACATTAAA 
10901 TTCMtStG imC^ CCTGAGCAGA AT6TCTGQCC ATGACTGGGA 
10951 C7WCAGAT GTrAGCATTT ATTAtTAGTA TCTGTCAGTC TTGAAATGH 
11001 CTCTTCCCTT GGCTTTCATG ACAHCCACA CTCTCCTGGT TTTCTCTTAC 
11051 CTCTCT6GTA ATACCIGTH GCTTATCCTT CTTTGTCCAG CTCTGGGATG 

Hioi TTACCATTC? VtSg^SgTG CTCTTTTCTC CTTAGGCAGT CTTACACACA 

5 CTCATCAC7T CCnCCATTG TCCTCCACAC ACTGATGACC CTAAAATGAG 

1 r?Ql TATCTCCAGe CTAAACCTTT CCACT6AGTT CTAGACCCAT ATGTTCTACT 
nisi ATCMCCT^ CT^CCAn TGAATGTCTT CCAGGqAOT CAGACTCTCT 
11301 TCTCTAGACT HGCTOGACT TTCACtCTTC CCCCTAAAAC TGGCTCCTCT 
n351 TcSS CATCTATCTC AITGAGAGGC AC^^^ 
11401 TAAGCCA6AA ACCTAGGAAT CCTTGATACC TGTTCTCTCT CATCCTGCAT 
11451 ATCCAAGCCT ATCAGTnTA TCTCTAAATT ATATrTTGGT AGGTTTACTT 
11501 CTTTCCTTTT CTCCCACCAC CACCCTGCTC CAAGCTACCA TCATCTCACC 
11551 TGGATGTCTG CAATAGCCTC ATCTCCCACA GCCACTCTGC {CCCCCT^ 
11601 CTGTTCTCTA TAGAGCAGH GGAAGGAGTG ATTTTTGTTG TTrGTTTTGT 
11651 TTTCTTrTAG ACA6AGTCTC ACTCTGHCC CCAAGGCTGG AGTGCAGTg 
11701 CACAATTTCG GCTCACT6CA ACTTCTGCCT CCCGGGTTTA AGCAATTCTC 
n751 CTGCCTCMC CTCCCAAGTA GCTGGGAHA AGGCACCGGC CCCCATACCC 
11801 AGCTAATTTT TATATmTA GTAGAGATGG GGmTGCCA TGTTGGCCAA 
11851 6CTAGTCTCG AACTCCTGAC CTCAAGT6AT CCACCTGCCT CGGCCTCCCA 
n9W SlffiG ATTACAGGTG TGAGCCACT6 CA^^^ 
11951 TCTTAAAAAA AAAAAAAACA AAAAAAAACT TGACTGTGTC ACTCTGTGTT 

izoorSaS ccSac nccAc^^^^^^ 

12051 GACCAAAATC CTTAACnGG CCAQGCGCGG TGGCTCACAC CTATCATCTC 
fiol SgGCCGAGG CAGGCAGATC A^^^ 

12151 ccatcctggc caacatggtg aaaccccatc tctactaaaa atacaaaaat 
12201 tagctggtcg tggtggcgtg tgcctgtagt cccagctact tgqgaqgctg 

12251 A^CA^GA ATCACHGAA CCTGGGAGGC AGAGGTTQCA 6T6AQCCCAG 
liol aSS? TOCAG CCTGGTGACA 6AGTAAGACT CCATCT^ 
12351 AAAAAAAAAA AAAAAAAAAA nCCHAATT TGGCCTACAG TAGAG(X(rrc 
12401 CGTAATGTGG CCTCTCTCCA CATCTCCACA ACCTCCTGCT CCCTOACTT 
12451 CAGCCTCACC TCTCTTCTGG ACAGGCCCTC CHCTGACAA GGGCTTTGTr 
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12501 CATTCTGCTC CCTCTGCCTA GAATGCCCCC TTACTCTGTT CACTTAACTC 
12551 CTGCHATCG TTTAGATCn TACCTQGATG GCTCA6AGAA ATATAGAAGT 
12601 AATTCCTCAC CCTGAAAAAT AGGTTAGGTC CC TGTTTT AT GTTTTCATAG 
12651 ACCTTTCCTT TGAGGCTTTT TTTAAAAAAG TAGTTTTAAT CTCACATTTA 
12701 mATGTGAT CATCTCCTO ATGATATCTT AAGACCTGTA ATAGAAGMT 
12751 TTGGTCATGG ACTGTGGGGT TTTTGCGCCT CATTGTGTCA GCACTGAGCA 
12801 TATTGTTGGC ATAGGAGGGA TATTT6TTGA AT6AATTGCT AGAGGTIGGCC 
12851 AA6AGATAT6 ATGTAAGTCA GGCTITTCCC TGCCCTTCCC CTTCCeCTTC 
12901 CCCACATCCT TCCTATAGCA GCCACCGTGG CTGCAGTTAC TGTAAATGGC 
12951 AAGACGGAAT CAGHCCGGA CAnGGGTTG TTTTAGAAAA TTGCCTGCAA 
13001 GTGTCAGGGT GATAAGTTAA AGCnTGTCT TTTQCCCTCA GAGGAGCTAT 
13051 CCCATAGTGA GTAGAAGCCA GAGAAGCTGA CCCCAGGAGT CCTTCTTTCC 
13101 AGCAGCAG6T CTTGAGCTGC ACnCTCtGT AGCTACAATC CAGGCAGGAA 
13151 CAAGCCCTAG GTACCTCCGG AGAGGAGGGC AAGAGAGGAA GAATGAGTTC 
13201 AGCfACTCTA GCCACCAAAC TGATTATGAA 7TGCCCT6AA ATCT6AAAAA 
13251 TTTCAATTCC AATC6TAAGT TTGnTTGTT TCATTTreTT TTCTTAAATT 
13301 6TATATTTGA AAGATGGCAT TAACTAAAGA 'TATATATTCA ATATAGAGTG 
13351 GAAAAAATGG AATACTTGCA TAGTATCTTT TACTTATAGG TGATTTATGA 
13401 TGGGGAGTGG GGTGGATAGG TrGGCAGTTC CCCCAAGAA6 TTGGAAATGA 
13451 AGTtrGTCCT CTGTGAGTTG AACTAAHAG ATCCACAAGT AATGAAAQGA 
13501 GTATTGTGtr GTAGTtAAGA 6CACACTCTA GAACCAGATT GCnAGTTTC 
13551 AAATCCTGCT TCTGCCtTTT ATTATCT6TG TACTTTOGGC AAGTTACTTG 
13601 CCCTITGTGr GCTTCATTTT TCTCATCTAG AAAATGGAGA GGCCAGGCGT 
13651 AGTGGCTCAT QCCTATAATC CCAGCACTTT GGGAGGCCGA GQCGGGCAGA 
13701 TCACCTGAGG TGAGAAGTTC AA6ACCAGCC TGGCCAACAT GGTGAAACCC 
13751 TGTCTCTACA AAAATAGAAA AATTAGCCAG GCATGA7GGC GQGTGCCTGT 
13801 AATCCCA6CT ACCCAGGAGC CTGAGGCGGG AGAAACACTT GAACCTGGAA 
13851 GGCAGAGGTT GTAGTGAGCC AGGATTGCAC CACTGCACTC CAGCCTGGGT 
13901 GACAAGAGCT AGACTGAGTC TAAAAAAAAA AAAAAAAAAC AAACTGGAGA 
13951 TACAGGCTGG GTGCAGGGCT TACACTTATA ATATCAGCAC TTTGGiSAGGC 
14001 CTAGGGGGGA GGATTGCTTG AACTCAG6AG TTTCAAGATC AGTCTGGGTA 
14051 ACAGAGCAAG ACCTCATCCC CACAAAAAAT CAAAAATTTA GCCAGGGATG 
14101 6TGGCTCATG CCTGTGGTCC CAGCTACTCA 6GAGGCTGAG GGGAGAGGAT 
14151 TGCTTGAGCC CAGGAGGTTG AGGCTGCAGT GAACCATGAC TQCACCACTA 
14201 CAT6CGAGCC TGGATGACAG AGCAAGACCC TATCTCAAAA AAAAAAAAAA 
14251 AAA6AAACGA GCCAGGCGCG TTTGCTCACG CCAGTAATCC CAGCACtTTG 
14301 GGAGGCCAAG GCAGGTGGAT CACHGAGGT CAGGftGATCG AGACTAGCa 
14351 6GCCAACATG GTGAAACCCC ATCTCAACTG AAAATACAAA MTTAGCGAG 
14401 GCATGGTGGC ATGCTCCTGT AGTCCCAGCt ACTCACnGG AGGCTGAGiGC 
14451 AC6AGAATCG CTTGAACCCA GGAGGCGGAG GTTGGAGTGG GCCAACAtCA 
14501 TGTCACTGCA CTCCAGCCTG GGAGACAGAG CGAGACTCTG TCTCAATAAA 
14551 TAAATAAACA TAAAATAAAA TAAAATAAAA TAAAATAAAA TAAAAAAATA 
14601 TGGAGGCCAG CAGGCACGGT GGCTCACGCA TGTAATCCCA GGACTTTGGG 
14651 AGGCCGAGGG GGGCG6ATCA CAAGGTCAGG AGATCGAGAC.CATCCTGGCT 
14701 AACACAGTGA AACCGCGTCT CTACTAAAAA TACACAAAAT;TAGCCA(^A 
14751 TGGTGGCAGG CACCTGTAGT CCCTGCTACT CAGGAGGCTG AGGCAGGAGA 
14801 ATCGCGTGAA CCCGG6AGGC GGAGCTTGCA GTGAGCTGAG ATCGCGCCAC 
14851 TGCAGTCCAG CCTGGGCGAC AGAGCAAGAC TCTGTCTCAA' AAAAAAAAAA 
14901 AAAAATGGAG GTTGGGCGCG GTGGCTCGCG CCTGTAATCC CAGCACTTTG 
14951 GGAGGTCGAG GCGGGCGGAT CACCTGAGGT CAGGAGTTCC AGACCAGCCT 
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15001 GGCCAACAT6 GTGAAACCTT GTCTCTACTA AAAHACAAA AATTAGCCAG 
1m51 ^CGA^ AGGCACCTGT AATCCCAGCT ACTTAG6AGA CTAAGGCAGG 
15101 /fiA^AGCTT ^TGQGA GATGGAGffTT QCACTCTGCT 6AGATCGCGC 
lil51 C^GOCTC SfiTAGAGTG AGATTCC6TC TCAAAAAAAA AAAAAAAGAA 
isZOl^TG^^C^^ 
15251 GTATTAtS^ 

lloi GCACTCW-GA ^TCmGTTC TTT6TTATTA GTTACTAGAG AGGCAAAT6T 

liSl ctSaS Ygaataatat GTCTGAATTG GTGATTGTCG cacatatcta 

15401 M^GTAGT TATTTTrTTC AATTAAAACT TAGTTTAAAA ACCAATATAA 
^CGAGCGC AGTGGCTCAC ACCTGTAATC CCAQCACTTT GQGAQGCCGA 
\^\ TCimU TCAGGAGTTC GAGACTAGCC TGGCCAACAT 

15551 GGTGAAACCC T6TCTCTQCT AAAAAAAAAA AAAAAGTACA AAAATTAaC 
15601 AGGCATGATC GCAG6TCCCT GTAAiaCAG CTACTTGGGA GQCCGAGGCA 
IsKl GgKtK TTGAACCCAG 6AGGTG6AS6 TTGrAGTGAG CCGAGTTTGT 
15701 GCC^GCAC TTCAQCCTGG GTGACA6AGG GAGACACTGT CTCAAAAAAA 
• XmAMAAAA ACC^^ 

15801 AGAAAGTCAA MGTTAGT6A A(3CAAAACTA GTACTGTATT CAGATAAAGA 
Isffil TGGT^TCT AGATTTGGTC ACCAGAATAG GGTCCTHGT GQCAACCTGG 
I59OI GCTAGTTT6G CTCACTCACC ACT6CCAG6A TGAAATTTCT TTCAGTGQCT 
StCATTTCC CTTTATnTA AGTCCATGCT CACAGAGCAA CCHCTGATG 
leOOl CCTMTTCAfi CTT^^ 

leOSl TAOTCTATA GGGGAW6A GTCITTCTGAT mAATAGTC AATTCATAAG 

leioi tgSSgaqg gtttgataaa tggttagctc agaacgatca cagaatgtct 

leiSl ACACCTCTTT GGACATOGG AAGGTCAAAA ACCTGAAAGG CCAAAAGCTA 

16251 CTGGGTGGTC CACCAGTCAA CHTCmG ATCACACCTC CnCGTCGTT 
ifi^m GCTTCTTTAA GCAHGAeCT 6TAATGGGTA TGGAATTTTT TGCTCACCTA 

\m\ Kct m«G« aa™ akccag^ gjttoatcg 

16401 CTTGCCTAAG ATCACACGCA GA TTTTCTG T TAACCAGGGT GATTTJJ^G 
16451 GTGTTCCCTG CCAGACGAGG GCTTTTTTCC TTGAATTGCC TAGAGATTJC 

lesol ttgaStatc cgaSattt ttcccagtgc agcctggaga aggatgtccc 
16551 tctcaacaca gcatttgtta ctcaatgtta gacattcaat tttctaatta 
leeOl GTATCAT^ oaacagtgg atgattatct ataaggggtt gcaattccat 
T6651 gcttSgtgc ttaca^cca tatagacaaa tatcagctgt taaaatgaca 
16701 StSc caggacaaag gcatactctg ctgttagtga 

16751 SStTC aCAGCAAAT TTCACATGGG CATATACACG GCCAJCTCTA 

680 gX?™ ATHATACCC attcagagag oamct^ mc wgat 

ififtm rAGCATTCTC TTrGGCATTT CAGCTTTGCG TTCTGTTAAA AATCACTGCT 
legoi TOTAAATA CCTC^AG CTCTTCACTG CCTGTAGGCA ACTCTTOC 
16951 CTAGCAGACT TGGTCTTTAG TGCTGTGCCC CTACTCTCTT CCACCATTCT 
n?oi SccTccTGT ™ttgctg CCCATATGTG CCATGGACTA GAGCTTACAG 
IS SStCAG cSSgA GCATAGWTA ^^^^^ 
mol TtSStGT TCTTCCnCA GQGCAGAATG CCTGTTACTG cctqgcaatc 
m51 AGTCTGCCAA TACCATCCCA TCnCTGTGG AGGApqCCCC 

17201 CGCCAAATCC ACCCATACCT CTCCCCACCA ATCAGAGACT TCTTCTCTCT 
TTC^Cr CTT^ATT CTCTTCATAC CTCAGTTATA TCCATHCAG 

ll^l UmSm ^t^CcTsi ATCACTCTTA GAGTGTGAAA TTCTCCAAGT 

I7i51 GTGGAGCCGT ATaAGTTTG TCTTTGTATC CCAGAGCHA GCAAAGTCCC 
iSoi taSatgtag tS^qctca gagighttgc tgqgtgaatg atgtajttct 
17451 T^GACTC TTTOACACT TGAATAAAGT CCATCCAGTA TGCACCATTA 
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17501 CCATCTCTTC GCTCTACAAT AnCTTTTAG GCAAGAGCTT ATCTTTTGAG 
17551 GTGATAAfiAT AAGCTCAAAC TTATGTAGAC TAAGACCTCA GTCTGTAAAT 
17601 GTCATCCCTA AGTCHAAAC CATCAAAACC AGGGCCTCAA GGAATGGCAT 
17651 GCCTTCTGCA ACTGTAGCAA CCTGCTGTGC TTATTTTGCC GTGTTTTTCA 
17701 TmreCCCe AAAAGCTAGA GTCCenCTC CCATGGGCAG TGCTGGAAfiT 
17751 GTGCTAACAA ATTCTTTCTG CATACTGCTT ACGATTACAA A^AAJpCCT 
17801 CAGCATCTCA TQCCAGACTT GAGTrAAQGT TGTTTTCTn TGTCTgTC AG 
17851 CT6TA1TCTG GTCATGACTT CCTGATGATG CCCTATAGAG ATTrTGCTGA 
17901 GATCAGAGGG TGCTCCACTG CCATCAGTAG CACT6ACTCT TGCA6AAGCA 
17951 CCGTTTCrGA AG^-GGCTAA TGTCATCCCT CACGTTTGn TGnTGAAAT 
18001 TTGTTTrAGT TCCAGAGATA GCAC7TTCAT G6AATGACQC TATCHCTAG 
18051 AATGACTTTT Ili l lHI I I TGAGTTGGAG TCTCGCTGTG TCGCCAGGCT 
18101 GGAGTQCAGT GQCACAATCT CAQCTCACTG CAATCTCCAC CTTCCGGGTT 
18151 CAAGTGAnC GCCTGCCTCA GCCTCCCGAG GAQCTGTTAC TACAQGCaA 
18201 CACCCCCACT CCTGGCTAAT TTTATGTGTr HAGTAGAGA CGGGGTTTCA 
18251 CC6TGTTCQC CAGGATGSTC TCGATCTCCT GACTTTGTGA TCTGCCTGCT 
18301 TCAGCCTCCC AAA6TGCTGG GATTACAGGT GT^GTCACC GCKCTGGCC 
18351 TAGAATCACC TTTTTATACC ATAACGTGAG CACCACTGCC GCGTCACCAA 
18401 GGAAAGAGAG AGGCAGCTAC TGTGGGGITA CAAATGGGTA AGAGTGGCAC 
18451 CAGGAAGGTG AAAGTCTCTA CTTAGCCAAG GCTTAACAAA ATGTCAATCA 
18501 CCAAACATTT ATnATTAAG CTACGTTCAG GATMGAAGA Jg^T 
18551 ATCT6TACAT TCATTTTCTC GnTGnTAACA AGCTAATGAT AGTGATCTAT 

18601 CCTGCCTCGC TCTGAGGGTT ATTGTGAGAA TAAAATCAM TCAAffmA 

18651 AAGCACTTA6 6AAAAAGAAA AGCATTGGTT TTCAATTGIT ACTGTGGATC 

mi A^CACTC GGGCTTGm AAAAT6CAGA niniAGCCC CA^ 
18751 GATTCTGAn CTGTATATCT GAAGTGGGAC JCAGGAATCT TGATTTTCAA 
18801 CAAGCTGACC AGAGGGTCCA ATGCTGCTAT TCCTTTACTT A^CTTTCAG 
18851 AAATATTACT GTAAATCAAA TGGCAAGAAT AAAATAGTTA TTTGAGQWG 
18901 TTnAGTATC TTGGACCTGG AGTCCAAAGA CnGGGTCAA ACTCCAGCH 
18951 TGTCAGTTCC TAGACCTSTG ACCTTAAACA GCAACCTTCT CTGTGJACCT 
19001 TAGTTCCCTe AGGAACGGCT CTGGTCACCT CCTGCT6TAC TCCAHGATG 
19051 ACTCACCACA TAAGGCTCCC TGGGAGTCCC CCAAACCTTT ^TCTCTTM 
19101 CTCCTTTTAC AGCCTCCTAC ATCTCCTGCA GGTGCTGTCT TCTCCTOCTT 
19151 TTTCCAGGCC CTGCTCTGAC ACAGCAHCA HCTCCTCTG ^GGGTTC 
19201 CTTCAATCTG TCTCCAAGCA CATCACACCC AG6AAGGACC CTGTGGG^ 
19251 ATCTCTCTAT CACCAGATCA AACTACGTGA AQGCAGGCAC TAGGTACT6T 
19301 CACTGCCCAG CATAGGCCTG GCCCATACCA GGTCTCCACA GATGCCTA6T 
W351 AAAGAAACCT AT6ATTCAGG ACCCCCAT6A TGAOAACTA JAGCACTAGA 
19401 ACA6TGATAA TAACTAATGT TTATAATGCA TCTTCAGTTT ACAGAGGGCT 
19451 TnGTACTCA TCATCTAGTT TAGTTCCTGC AACAACCTGJ TgG^TAT 
19501 AGCACAAGCA GGACAAGGGA AGCCCAGAGA TGTTAAATAA TTTATCCAAG 
19551 TTTATGCtGC TGGGAAGGGC AGCACTGAAA HAAAAGAAA AGTTTrCTGA 
19601 GCTCAAATCC CATGCCCTTT CCTCAATGTG AQCTCTAGCA AGGTATTCAG 
19651 GAATCCTGCC TCTACAGHC AGAGCCTCAA ATTGCT6GGT ATGnG^ 
19701 CTTGTATCTG ATTTTTCTAG AHTCCTGCC CACATTCTTA CTG^JG^T 
19751 ATCAGGAAAG AGTHATCAA ATGCCTGTGG AAATCCAAGA TAA^CTCA 
19801 TGAT6AGTAA CCCAGT6AAA ACAT6AAGTC AAGTCTAACT AGTCAC^CT 
19851 ATTTCACTAC TGCTGACTCC T6ATGATCAG gCCTTTTeT AAGTqc™ 
19901 TGTCCACTTA HCCATCATC TGCCTAGAAT TT ATGTGAA G 6AATCAAAGC 
19951 AAAAGGATCA TAAGGCTTCC TTTTTCCAGT ATGrTTTTCC TCCTTTTTGA 
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20001 AAACTQGGCC AGTTAGCTAT CTCCATinT ATTTCATGAA TACATCCCCA 

20051 GCGCCTGGTA TATAGTAGAT ATGGAACATT ACACTTTGGA GATATTGCAC 

20101 CCATTCTCCA GTHCTCCAA AGHACTAAC AATGGTTCCA TCACTGTGCC 
20151 AACATATTTT CTITTTTCAA TATATT6GGA AATAATTCTC CCAGTCTGAA 
20201 AATCTGAAGA CATTTCATGT GACTfGGTAT CCTCATATGT CnGeGCTTC 
20251 CAATTCTCCA TTCCTAGTTT CAAGTTCATG AACTGTAAAA CAAAGGATTA 
20301 GACTAAATCT CTAAAGTTCT ATCCAGATGC CAAATTCTTT TCTCTTTCCA 
20351 TGATACCTAA GATAGATGCC AAATAHGTC TTTTACCTGG TGfTTTGTGAA 
20401 CAT6ACATCA CAHACAGiSA GTAGCAGATA CTAAACTCTC ACTCTCTAAA 
20451 ACACTGACT6 AGTITCCATGA GCCAGATACT GAAGTGAGCT TGTTCACATA 
20501 -retTCTCAn TAATGCTCAT AACCCTGTGA AGCTGGGAAT TGCTGG6ACA 
20551 TnTATTTAT nATTTAnG AGACQGAGTC TGGCTCTGTC ACCTAGGCTG 
20501 6TGTGCAATG GCATGATCTT GGCTCACCGC AACCTCCGCC TCCCQGGTTC 
20651 AAGCGATTCT CTTGCCTCAG CCTCCGCAGT AGCTSGGATT ACGGQGCACA 
20701 CACCACCACA TCCAGCTAAT TnGTATTTT TAGCAGAGAT GGAGTTTaC 
20751 CATGTTGGCC AGGTTG6TCA CGAACACTTG ACCTCAAGTG ATCTGCCTGC 
20801 CTCAGCCTCC CAAAGTQCTG GGATTACAGG CATGAGCCAC CATGCCTGCC 
20851 CGGGACCCn GnTTAGAAG GATGACTGCT GCTATAATGT AGAAAGTGAT 
20901 TTGGAAGAGG G6AGGAGTGG GGCACGAAAG ATGGTTAGTA GATGQGGGTG 
20951 6TAATGCTTA CCTTTCAGTA TTTOGAGGCT TCGGAGTCCT CAAAAATTCT 
21001 CTTCCTTGAT TGGAGTCCTC CCAGCCAATA GAGGGCTTCA CACAAACAGT 
21051 ncnGGGTT nGAATTGTT TGACCAGAGC TnCTTCCGA CAAAAGGTTG 
21101 GGGTGATTCA nCACTTACC ACACCTTGCC TGAACATTCA CnGGGGCTG 
21151 CCQGTTATGA AGGCTATTGT TCTCCAGCCT GTCACAGACG CTTTGAAGAC 
21201 CTGTGCCTCA GCTGGtrCTA AGGAGTCAGT TTGTTCAGCT CCGTGCCAGG 
21251 mCCAACn ATGAAATGTG CTGGAGAm ACACCTCTCC TGa:ATm 
21301 TCCCTACTAT AAHGCCAGT CAAAGGAHC CTGCAGHGC CTCTQGCAQC 
21351 CATAACTGAT GAATGTTCTG CCAGCTGCTC TGAQGACCTA GAAGAGCAGT 
21401 TTTCTATCCA GGACCAGTTT CCAAGGGTGG GAGQGTGAAA TATATCCTCC 
21451 AGTGTGACAT HCATCTCCC AGTGATGGGT GQCnGGGCC CTTTGAAGTT 
21501 GGCTCTGAGG AACCACACAC nGGGTCTGA QCAQCCAGCA GCTTATCACA 
21551 TCTGGTGATC AATCCHCAA AGGTTCCTCC TGAAGTCTGA ATmTGGAG 
21601 6TCAAATGGA TTCCACaGG GAGGGGCTTC TGCTTCAACT CAG60TGG 
21651 GGAGAAGGCT GTTCCTCTTC CAGGGGGAGG CAGTTTTCAT GGCAHGAGA 
21701 TGTCCTCTCA CTTATTCCCC ACCCACCCAC CAA6TCCTTT GTAAGAGGAG 
21751 TAGGGG6AGA GGAGAGCGCC TGCAGCCTCC TGCTCACATT CCTAGACACC 
21801 GACTCACTGA GCCCGTCGCC GCTGGAACAG CAGAGCTGTG TGAAATGTCA 
21851 AGAGGAGTTA TGCTCATAGG CTCCCT6GCC TCAGTCTCH TGTGGCnGC 
21901 ATATTCTTCC ATTAGTACTG TffTTCATCAC ATGGAAATCA GAGGGTACAA 
21951 TTAAAA6ATA ATTTGCTAGT CCCAGACHA ATITGQKCC CCqTTCTTQ: 
22001 CTGATTGAAT TAGAGGQGAA CATAATAGAT TnTGGTGAG AAATAGTrGT 
22051 CTGTGTGQCT GGGAGAAAGA HGCTCCCAG CTCTCCAGCT GQGCAGGGCT 
22101 TTCAGTATCC CGTATGTTAT TTCCCCACTT CCAQCCCACC TCACCTCCTC 
22151 TGTGGCCCn GTGTGTCCCC TCGGCTAGGA TCCT6ACCTC CTGCTCAA6A 
22201 6TTTAAACTC AACTTGAGAC CCAAGGAAAA TAGAGAQCCC TCTGCAACCT 
22251 CATAGGG6T6 AAAAATGTTG ATGCTGGGAG CTATTTAGAG ACCTAACCAA 
22301 GQCCCAGACA GAGAGAGTGA CTTGCTAAAG GCCACATAGC TAGCCCACAG 
22351 TAGTTGTAAC AATAGTCTTA ATGATATTAA TGGCTAACAT TTATCAACCT 
22401 TTAATGTGTC CCAGACTTTG TGCCAAGGGC HACATGCAG TGCATT6TCG 
22451 CAHCAAACC CAGACAGTCT GGCTCTGQGC CCAGGCTGAG CTTTGGTATA 
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22501 GCATGCTAGA ACGTTGTCTA TAATGTCTAG TCTGGGTTCA AATCCTGGCT . 
22551 TCACTTCTCA CATTTACAGC TGAGTGACCT CAGGCAACTG ATTTAACCTC 
22601 CCTGTACCTC AGTTGCnTA TCTGTAAAGA GAAAAATCAC AGCACTGTGG ^ 
22651 AATAGTGGQG GTTAAAAnC ATTCATACAA GTAGTGCTCC AAGCAATGTT 
22701 TAATACAGGG TGAQCAaTG TTCAGTGGTT CCTOmiG GCTGCCTGTG 
22751 GGGCTAGAGT OTGGTGTCn CGTGGTAtftG ATAGATAGAT ATGGCTGAGC ^ 
22801 TCTGCACAAA CACCAAGAGC TCnCTTCAC TATTAGAGGT AGTAAACAGA 
22851 GTGGnGAGC TCTGTGGnC TAGAACAGAG GC CGGCAAGC TA T66CCCA T 
22901 TGCCTATTTT AATACGGCCT GTGATTGAn GAI 1 1 1 i 1 1 1 TTCTTTTTGA 
22951 GACA6A6TTT CACTCnGTT GCGCAGGCTG CAATGCAAT6 GCACGAACTC 
23001 AfiCTCACCGC AAaTCTGCC TCCTGQGTTC AAGCGATTCT CCTGTCTCAG 
23051 CCTCTCGAGT AGCTGGGAn ACAfiGCATGT GCCACCACGC CTQGCTMTt^ 
23101 TTTGTATTTT TAGTAGA6AC AGGGTTTCTC CATGTTGGTC AQGCTACTCT- 
23151 CGAACTTCCA ACCTCAG6TG ATCTGCCCGC CTCAGCCTTC C AAAGTOCTG 
23201 GGAHACAGG CGTGAGCCAC CATGACTGGC CTGATT6ACT GAI I I 1 1 1 lA 
23251 6TAGAGATAG GGTCnGGTT TGHACCCAG GCTGGTCTCA AACTTCTGGiC 
23301 TTCAAGCAGT CCTCCCTCCT TGGCCTCTCG A ATGCT6G GA TTATAQGCAT 
23351 GAGCCACTAT GCCTGGCCTA TAT6ACCTGT GATmTAAT GGTTAQGQGA 
23401 AAAAAAGCAA AAGAATGCTT TGTGACATGT GGAAAHACA TGAAACTCAA 
23451 ATAtCAGTGT CCCAGCCT6G GCAACAAAGT GAGACCCTGT CTCTACAAAA 
23501 AATAAAAAAA AATAAGCCAG iGGCCGGGCGC AGTGGCTGAC ACCTATAATe 
23551 TCAGCACTTT GGGAGGCCGA GGCAA6TGGA TCACCTGAGG TCAGGAGTTC 
23601 AAGACCAGCC TGACCAATAT GGTGAAACCC TGTeTGTACT AAAAACACAA 
23651 AAATTAQCCG AGCATGGTGG CATGCGCCT6 TA6TCCCAGC TACTTGG6AG 
23701 GCTGAGACAA GAGAATTGCT TGAACCTGGG AGGCGGAGGT TGCAGTGAGC 
237i51 CAAGAtCGCG ACACTACACt GGAGCCrGGG CAACAGAGCG AGACT(XGAC 
23801 ACAC6CACGC ACGCACACAC AGACACACAC ACACACACAC ACGCTQGSTA 
23851 TGGTGGCCAG CACGT6TGGT CCCAGGATGC ACTGGAGGCT TAG6TAG6AG 
23901 GATCACTTGA GCTTAGGTGG TTGAGACTAC AATGAACCAT GTTfATACCA 
23951 CTGCACTTTA GCCAGGGCAA CA6TGTGA6A CTGAATCTCA AAAGAAAAAA 
24001 AAAAAAAAGA AAAAAATCTT TCCATAAGTA AATATCTGTT GGAACATAGC 
24051 CATGTCCCTT AG7TTATGTT TTATATATGG CTGCmTGC CCTATAATGA 
24101 CACAAHGAG TGGCCAC6AC AGTCTGTATG GCCTGCAGAG CCTAAGATAT 
24151 TTGCTCTCTG GCCCTTTACA GAAAAAGTGC CrTGACCTGT GCTCTAGAGC 
24201 CATATGTACC AGGTTTGAAA CTCAGCCTCA CAGCTGGGTTG TGATBGCACG 
24251 CATCTffTAGT CCCAQCTACT CTGGAQGCTG AGGTGAGAGG ATCACTTGAG 
24301 TCCAGAAGGT CGAGGTCAAG ATTGTAGTGA GCCATGATGG CATCACCGCA 
24351 CTCCAGCCTG AGTGACAGAG AGAGACCCTG ACTCAAAAAA AAAAAAACAA 
24401 AAAAAAAAAA CACCCTCACC ACHATGAQC TATTTGTCn GAiSAATAGTG 
24451 ACATAACCCe TCAGAACCTA TTTCCTAATC TffTTAAATGA GGCTGATGAC . 
24501 GTnCCTCCT TtTACTGGCA ATTTAAACAT 6ATGGATAAT AAATCCTAAG. 
24551 CACHAACAC AGGGCCTAGA AGATATTAAC TGCTCAATAA ATQGTAGCTT 
24601 CHAACAGTA TTCAAACCCA TGTQCTCnA TCACATGCAT TGiTGTCCCT 
24651 GTGTCCAGTT GGTGGAATGG GAAAAGQCTC CCTTGTAACC CCATCTACCA 
24701 TCTTTATCAG AeTTTCCTGC CATG6TTCAC AGTAAGAGAT AGAAQCTGCA 
24751 CGGTGACTTC TGGCTCITTA CAATGGTGAG CGG7GTGTGC CTGGTAAGG6 
24801 AGAGCTGATG TCACTGCCCC AAATCCAGTA GTGAGATCTG AGT6TTCTGG 
24851 TTTCCTCCAG CAGCCTTGCT TTTrCCTTTA CAATCCTQCA GGCAGGGAGA 
24901 CAAGGGCTTT CTACATGGTA GGCTCTGGTT TGGTCATCGT CACAACTGGG 
24951 GGCTGHCAG GTGGGCTCCC AHCCAGATA CCTAQGCTTA TCAATCCCTT 
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25001 TTGGCACCCC AGGCCTTTTT CTCCCTCATG CCCCATTin CAGTTTGAAA 
25051 AGCATGGHA TCACAQGACA AGTAGAAGAA GCTCCACTGT CCACTGAGGC 
25101 CAATGGATGG TGTTCTGCAT GT6AACACTC A6TGAATAGT GAjST GAATGA . 
25151 GAGTAACCTG GGCTCCATCC TATTTGCAGA GAGCTTTGGA AAAGATTnT. 
25201 CTCCHAAAG AGCCAGAATG AAGCCTGGTA GTGG6A6A6C TCCA6CTCTA; 
§251 GAGTCACyvi-G AGCCTACATT TAAATTCCAG CCCTGCCACT GACTCCCTTT , 
25301 TTGACCTTGA GrTGAGnTACC TAATCTCTCT CTACCTCACT TnCTTGTa 
25351 Kgtggg AATAATTCCT GTCTCAGAGA AATAAAAGAG jgcatatagt 
2M01 GmaCACA TGGAGACACA TCAGGTGTAG ghaatactc tgggccttct 
25451 TTCCrrATTT gcaacacaqc CCTGCCCTGG agtggaagtg gcacctccca. 

25501 TTG6TCAGCT CHGAGQCTG TCCCCAGGAC AGGCAGAGGG AGGGAAW 
25551 TGGGAGCCCT AGTGCCAGGA cagaacagat ggcagctcag agcta^jg. 
25601 GCTCTCTGGA CCT6TCTCTC CTACCA6AG6 TCCCCCCG TC TGGTGTgQT 
25651 CTTCCTGGAC CTGGCATCCT CTGCTnTTT TTTTTTTCCA CCTCCAAGCA 
25701 6AATTACTGT CCTCTAGGCA GCTCCTCTGC TTGAG6ACAT CTGGG^G 
25751 ATATGirCAC ACTCTATCCT GCCTTGCCCT TCCCTGAGCT CAGGATGGAC 
25801 aTCAA™ TCCCAGTTAT TGTCTGCAGC GCCTGCCTGC AGCCTCGATC 
iiii SiSf CCACCCCHG CCTGCAAGGT CTGITTCCTA A^^^C 
25901 CAACCACACA CCTCGCnCT GCGGGAGCCC CTCCTCTTCC TCCaCCCTC 
25951 CCTCATtSg GGSrGGGACT GAAGAAGAAG GCTAACTTGA CAGCABCGCT 
SoOl TCnrCTTAG CTAGTCACC6 GCCCCTGCTC AAGAATGCCA GTGIISTGTCT 
26051 AGCCTCCACA GA6AGGTCGT TTTCTCGGAG TCCAGAGGGG COG^GA^ 
26101 TTCTGAGAAC TAGGGAGGAG CCATCCCAGC CAJGAGCCCC TGTGGGAATC 

26 S StS c!!agtggcct ggagtcctca ggctcccgca gcto^gq 

26201 AGGGAGAGGT GAGCTCAGGG CAGCCTGCCT GCAGCCAGAG GTGCCGQGjq 
26^51 CCCCGGGCCT GTCATGCTGG CCATCTACAG ccggcctgag gcagtcacag 

Od'iM ACet5ATTTGC AGCTGAGCCT 6TCTATCT6G TGTGGGAAGA AGATGGGGAG 
26351 TT^CA ^C^rr ACTTCACCTC CAGAGACCT6 TTTCgrGAG 
26401 7TGCTCTCCG AffTTCCCCTC TCCATCTCTC CTGGCCCCTG 6TCCT6AGAG 
i6451 6AMGTGGTC TCCCTAAATC TCCTTCTCAC TTAGTCCTTT ACCATCG6TT 
26501 CTGCCGGGCA GAAGCCAGGG GAGGTTATAC CCAAGGAGAA TCGGCCTTCT 
26551 GaScCC CAmiGTCC TGGAAGTGGT GAGGGGAGGG ATATACCCAG 
26601 AAGGAACTTC HAGGGAGCT CCAGCTCCCC TTCTATCCCA GACAAACaj 
26651 AAGGAGCCTC CAAAAGATQC CACTGACCTG CCCATTGTAG ATGTTACTg 
26701 TTCCGGGGGG AATAGCCCAA ATAGAGTGCT GTTTCCAGCT CTWTGTC 
26751 HACCTGCGG GCCAT6CTGC CTGCCCAQGA A7TTGTCCCA ACAJgAGGA 
26801 TGGGCAGGH HGCCAAACT GTCGAAACTG GCAAGTCCTG GGTG^ 
26851 GCCTGGTACA CAGTAGGCAC CHATAAACG TTTGrTCTCT JAATQ^GG 
26901 CACATTTGCC TQGGCCnG AAGGGCHCT 6AGCTCCCAG GTGAATGJAq 
26951 rreeiGGGGA AAGACCTGGG CGAGITGCTTC TAAGACTG6A GCAATWT 
27001 TTAGaS CCTGAGCTGC TGGQCCAQCC CCCACACCTC CTCAGTCCCT 
2 5} ACCTCCACGA GCCTCTCTCT GTG^C TCAGAGQGAG 

27101 ATGTGGAAAC TCTACCTCTA ACCTGGCITT CTTTGCTCAT JGCCCCACTC 
27151 CACCTCCCAT AGAAACTCCC CAGGGGGTTT CTG6CCCTCT GGCTCCCTTC 
27201 TGAATGGAGC CA1TCCAGQC TAGGGHrGGGG TnGnTTCA TTCTTT^ 
27251 GCAGCCTGH GTTCCAAAAA GGCTGCCTCC CCCTCACCAG TGGTCCTGGT 

27301 Egactt^cc CTTCIGGCH ctctaagcta ggtccagtgc cc^ttcttg 
27351 ctgccgggat actagtcaqg tggccaggcc ctgggcagaa aagcagtgta 

27401 CC^GT^ TTGTGGAATG ACCGGACCCT GGTAGAHGC TGGGAAGHPGT 
27451 CTGGACAGGG GGAAGGGGGA AGG6AACTGG TCCTCAATGC TCACTCTACC 
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27501 MQCGCaiG CTAGACACH TATCdTTAA TCTCTCAACA GCCTAAAGAG 
27551 ATTATATATC CCCATTTTAC AGATGAGGCA ACCAGTTTCA ACAGAGTTAA 
27601 CATATGGAGC CTCACTGGGC AGCTTTTTCT GTCTTCCTGA CT7TCTCTCA 
27651 TCCTTCAGGG GGCTGCAGGT TTGTTTTCTT CTCCTAGTGG AGAGGAAATT 
27701 CTGAGGmG TITTCCTCTC CTAGCAGAGA GTAAAAAAAG CGATAGT^ 
27751 CCTGACTfGT TGAAGGTGTG GCTGAGATTG TTTTCTAAAG AGCCAATGGA 
27801 AATTGATCTT GAGTTTAGGA GAAAGCTTTT ACATGTGGAA TTAAGATGCC 
27851 AAGTCTTGAA GTAGCCACAT TTCAGGTCCT CATTAATnC TCTTAATCCT 
27901 GGGAAGGCAG CTTAGGAGAA GGGnCTTCC TTTAG6AGCC AGGAACTATA 
27951 CCCCTTTTAC CCniGGAGAG GCAGG6AAGC CAGGGAGGAC ACAACnCTC 
28001 AGGAA6AGGA GAAGCTAGAG CAGATAGTGA ACTCTCAACC TGAACCTTTA 
28051 AGGGCCAGAC CACtAATGCC ACCCAACTCC AGaGCCGlT TGTCTTGTTC 
28101 TGTCCCAGGC TTTCTGGAGA ACCTGATCTT CTTGCCGCTA CCCCCAAGCT 
28151 CCGITTGCCC AGCTAGAGTC TGGGGGGTAC TGACTGACTT TCCTA6ACAT 
28201 TCTTCCCTTC CCCAAATAA6 AGGCCACATT CCTGAAGTCA CTTCTGAAGA 
28251 GATAGCTGCC ACACAGGGCT CTTTCCCCCC AGGGAGGGAC CACCCAGACC 
28301 CTCTGCTCTC CCAGGTATCC GHACCACAT CACTACCTGG TCAGAAAGCT 
28351 GTTTCTGCCA nAQCCCCTC CCTCTTTTAT TATAG6ATAT CCTCAAGGGC 
28401 TCCJCnTGG GCCTCAGTTT CATCCTTGGC AGAAAGTAGA AGCTAGACH 
28451 CTTGGGCTCC T6AACAGGGT CCTTGCTGGA TTCTGT6AAA CAAATTAAGT 
28501 TCTTGACCCT AGGCCTCTGG -GGGAGTACAA AGTCTATGGG AGTTCTGGQG 
28551 CTGTGGTTGC AAGGWAAGTG ACGCAACCAG ATTCCATGGG GACATGATGA 
28601 GGC6TGACAT GTGAGGGAGG AAGAGGGAGC AAGGGAATGA . A^5AATACAAC 
28651 TTCTGTGTCC CATACACCCC TGCCTGACAG GCCATACATA CTCAGCAGAG 
28701 AATGCAaGT CnrCCTAGC AGACTAGCGT GAGGAGTGAG CTGCAATTAG 
28751 CACTGTGCTt CCAAGTAAGA AAATACCtCA AATtGGAATT TACAAAAGAG 
28801 GTAAAitAGG GAGTGGCTTT T6TCGGACAT CTTTAAAGCA TntrCTTTT 
28851 TATAGAAT7T CACTTAAT6T CCAATACTGA TTTAATGAGC TTGGGnTAC 
28901 ACAHATCTC TTGAAGAAAA CAAATGAACC TrTGTGlTCC AAAGCAATCC 
28951 ATGTTTAAAG GGAAAAAATT ATGCATAACT CTGCCCAGCT TCACAGTAAC 
.29001 CnTGGCAGG TGCCTTAGGT CCTCTGGGAC TCTTTTCCTT ATCTGAAAAA 
29051 TGAAG6ACTT GGATCAG6TG AATGGTTCCC AGCTCTGCAA CnATGTGGC 
29101 TCCTCA6AGG CACACAAGCT CTTTTCCATr ATTTGCCAAA TAATGGAGQC 
29151 CCTGTCTTTA ACTGCAGrAC AACTACACAA AATACTTGAA ACTACAGTCT 
29201 TCCTGGTnr TGGTTG6AAC TGAATCAGTG CACTCTAGCA ACACTTATTT 
29251 CnGCTGTTC GTAGGCTTCA TTATGTlGnT QGTTAATTTT HAAAACAAC 
29301 AATAACATAT TCCATAATAA TTACAGCTTA ATTOGCAGAC TBTrrCAGTC 
29351 TATAGGATCT GCAGGAAG6A 6GAGTAATAA AGGGATmT 6ACTGAGCTC 
29401 TTATGGAACA GAGTCTCTCT AGQCCCCTGT CATATCTGCC CTTCTGGGCC 
29451 CTGGGGAAAA GTTGGCATCC CCAGnGTGG TGGTCTCCAG GTGCCCTCAG , 
29501 GCTGrGGTGG AGGGAGCnC CCATTCTCTC CTTCAGCCCA CTCAAlfeAG 
29551 AGGCTAGGGG CT6AAA6AAG CHCTCTACA ACTGGCTGn CACTGGGAGG 
29601 TTAAQGGATG ACCATCCAGC CAQGCCTFCC TCAGGACATG GGAGGGCTTA 
29651 TGCTTTAACA TGT6TAAATC CACTGCAATA ATGACTG6TT CTTTTACCCC 
29701 ATAAQGTT6A GAATUACCT 6TAAACATTT TTGTCTGAAG AATTTGGATG 
29751 TAAGT6AGGG CTGGGCCTCT ATCTTATCTC ACTTGGCnC TCTCAGCACA 
29801 GCACCTTGCC TGCTTGnCT TACACATCCT AGATISCACAG TAACTATTTC 
29851 CTAAnATTA 6AAATCTATT A6AATCAATT GATTTCAGCT GQGCTTGGTG 
29901 GCTCCTTCCT GTAATCCCAG CACTTTGGGA GGCTAAGGCT GGAGGATCAC 
29951 CTGAGTCCAG GAGTTTAAGA CCAGCCTGGG CAACATAGGG AGACCCTGTC 
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30001 TCTACAAAAA ATAAAAAATT AGCCAGGCAT GGTGGTGTGC ACCTGTAGTC 
30051 CCAGCTACTC AGGAGGCTGA GGCAGGAG6A TCTCHGAGC CTGGGAGGTC 
30101 AGACTACAGT GAGCAATGAT T6TGCCACTG CACTCCAGCC TGQGTGAMG 
30151 AGTAAGACTC TGTCTCnAA AAAAAAAAAA A AAAAA6T TG ATTTCTATTT 
30201 GGATAGATAA ATAATTCATT TTAGGACGTT TCTTTTTCAC HACAGAAAT 
30251 StcATT CT^TGAG AAGCAGGTCC ATATTGeTAG GCATAGGAGA 
30301 AAAAGQGGTC TGTCTGCATT TGCCCTTGGT GGTCTCAAAT TGQQG^GGA 
30351 AAGAAAT6AA CACTFACTGG CTACCTTCTG TGAGCCAGQC ATCATGCAAG 
30401 ACATCTGTAC ATAATTTAAT TCTCATAACC CCATAAGATA JWTA^ 
30451 TGTACAA6TG AGGAAACTGA GGCTCAGAGT CATGAAGTAA CTGGCCTTGG 
30501 GTGACACAGA TGGTAAATGG CAGA6AAGGA ATATGGATCC AGGTCTTGAA 
30551 AGAGAAAATC TCAACTGATT ATCTmTTA AAAAACTCAT ATGTTCTCTG 
30601 aGACtCAAA AGGfTCTCTGT GTGGATCTGG 6TTGACCCAC TGAACT 
30651 ATGAGGGTTC CATGCACTTT 6TATCTGCCC AAGCCCTCAG AACCCCTCAG 
30701 TAATffmTG GAAGATGAGT mGGAGGTT GTCCHAGQC ATAQCCTCAG 
30751 CGTATCTAQG CCTCTAQGTG ATCTCCCCTA ACCTGAGGAT TTCAGCTCAA 
30801 ™CTCTGG CTCCTCAGGA CAGTGGGATG ACTGGTTCAG ACCTCAGCTT 
30851 TACCACCTCC CAGCTGGGTA CTCTTCTACC JACAGCCAGG GC^TT^ 
30901 ACTTTCACTT 6AAACTTCCA AAAATTGAAA GGTAGAAAAA CAGCCTTGQC 

30951 tttgggaAga acgtatgatg tccatggcct ctaagcatct gaggtgggac 

31001 ATCTTC6AGT AGCACCTTAC AGTTCCAAAG TGTGTTCTGG GITCnTGTT 
31051 TAAAAGAACA' GAGACTGCTG GQGAATTGAA CACTGTGAAG TATAT6AAGG 
31101 aKattg TGCTATTTAA CATTCAGTAC HGGGCTAAA GGAGAAGCAT 
31151 CAC6AAGTGT TAACACTCAA AQGGTCTTGA GCTGTCA^ CTOAOTTC 
31201 CTTATTTTCA CAGGT6AGAA TCCTGAGGCT CAGCTGTTGA GATCTQCTGT 
31251 CTCACTCCGG TGACAtAGTA CAGTGGATCT mmGCAG CCAAGCACAC 
31301 ATAGCnCAC ATTCCAGCTC CATCAATTAT GTATTGGGCA G^ 
31351 ATGATTTGAC TTTAACTCTG CTTTTCAGTC TTCTGTAAAA CAGQGATAAT 
3140 CCTCCTACCG TAGGGHGTC AGGATTAGAG ATAATATAAA TMOffWCCT 
31451. CATATAGGAC CIGGAHATG GCTGGCATTC AATAAATAGT AQCTGTTAAT 
31501 TGATAGCTAA GCTAGAACTC TGAAGTCTAC CATGGCAACT TCTTMGTGG 
31551 TCTGAGAACC CAGnGTGTT CTGTGGCAAA ACACAGCTTA GGGATCCATA 
31601 CCCAGCCCTC CTGTCAGCTG TTCACCHCC AGHCHCAG A^GTCT 
31651 GGCAGTGACT TTGGCCACAT AGCTGGCTGT GCCCTTTAAA G^TTCCTT 
31701 GACACAGATA TGTCGACTGG TGAC6TTGCT CTCCAgCAG CTCTTCTTjCC 
31751 CAGCAG6CTG GCCTGGCTGT aCCTGCATG CCT6TACTT6 TTT6TCTCCC 
31801 TGCTCCCTCT CCTGGGCCTG GCCAGAGCTA CHGCAGCAA ACAAAAGCAG 
31851 GATATTGQCA ATGGAAAGGA GQ6TGT6TTC TGGTGCTCCC ATGCCCTGCG 
31901 GCGCACATAC CATTGCAAGG GC6TAACAGA GCCCAGGCCT GCATTTGGGT 
3i951 KMATMGT CtGCACAtAG MGAAAAGAA GGACCTGGTG ACCAGGAGCC 
32001 AfGGAACCCT TGTGCTCGCC TACCTGG«:T ACTGGTTCTT GCCACTCCTA 
32051 CCATTTTCAG TTTGGAAATA TTTGnAAGG CTHGCTCTT CCAGGTCCTT 
32101 TGCTTGGTGC T6AGTCTACC AAGAGTAAGT GGGATGCTGT TTTTGTCCTC 
32151 AQG6AGCTAA CAGTCTAGTG AAGAAGAAAG ATGGTTGCCC AGGAAOTCT 
32201 A^^ GCAG6AGQCA AGAAQGAAGC CCCTGCTCCT ACjaCA^C 
32251 CTCTGnGGG CACCCCATAG TTOTCAGAA CCACAWAA TCCTC^GC 
32301 AGGCCAGGCA TAGTGGCTCA CACCT6TAAT CGCAGCACTT CgGAQGCCA 
32351 AGGCGGGCAG ATCACTTGAG GTCGQGAGTT CGAGACCAGC CTCACCAACA 
32401 TGGG6AAACC CCGTCTCTAC TAAAAATAGA AAAATTAGCC GGGnnGGTiG 
32451 ScGCCA GTAATCCCAG CTACTCAGGA GGCTGAGGTG GGAAAATCAC 

FIG. 3-1 3 



U.S. Patent Jan.22,2002 Sheet 19 of 41 us 6,340^83 Bl 



32501 TTGAACTCGG GAAGCAGAGG nGCAGTGAG CCGAGAHGT GCCACTGCAC 
32551 TCCAGCCTGG GCGATAAGAG CAAAAHCCA TCTCA)\AAAA AAAAAGAAAA 
32601 AA6AAAAAAT. CCTCACTGCT ACCTTGAAAG TAGGTGATGA CATTGCCATT 
32651: TCACAAATGA GAAGTGAAGG GGCTAQCCCA AGATCACHA GGTCGTAAAT 
32701 GGTGGTGCTA AGAHAGAAC CTCAGATCAT CTAGG6AAAA ACACAGATAT 
32751 GCACAGASTT MGGGGACCC AGQGTATFGT TTGTCCTCTT GTTTCAeAGG 
32801 TGGGGAAACA ACCCAGAGAG GGAAAGGGGC TTGTCCAAGG CAATTTAfiCA 
32851 CCCAA6AACT TGAACCCATA TCTCTCTCCT CCTCATTTAG AGCTCATCCC 
32901 ACATGTATCT TATATTGAGA GGAGTCTGAG CCACATACCA AGAACAGTCT 
32951 TCCCCTaGC CTCCAACCTC ACTGTGCAGmTGAGACAC tTCACAOeCA 
33001 TACTCTTCAT GCCATACCCA GCCCTTAAGA CCCTGAAGTT.CCCCTTCCAT 
33051 AAGACAA6TA GGAAAAGCTA TAGGGTAAAA ATAGCCATCA GTGTTtGjtG 
33101 AGCACCCAGG AGGAAnGGG CACTCCAGAA AGATAAAGGG ATTCtCAGGG 
33151 ACTTGCTTCT CTAGACHCC CTAGCTCAGC TGCTtCAACt CATTCCTGCC 
33201 CCTCnCTCT ACCTCCCGCA GTGCTCAGAA GTAGTAGAAC TCACTGTGGC 
33251 CTCTCACCTT GCAHGnGA GmTATTTA GACTTTCTCT TCCTCAACTC 
33301 TTCATAAGCT CATGAAAGGT GAAGTAG6GT GCCCTGTGTA TrTATCTTTT 
33351 ATATCTGCAG TGCHAGCAA GTTATAATAA TGCACTTGCC TGGCAAAAGG 
33401 CTTTCTCTCA TACATTAGCT TATTTCCTCT TCACATTGGC TCTTTGTAGT 
33451 AATAGGATGC TATTAGTTAT TTTCAATGAG AGAAAGCtAC TAASAGAAGT 
33501 TGTCCAGCTA GTGACAGTAA G7GGCT6ATA AAGT6AGCTG CCATtACAn 
33551 GTCATCATCT TTAATAGAAG TTAACACATA CTGAGTTrCT ACTATAtTGG 
33601 GTCIIIIIII 1 1 1 1 1 1 1 II I IIIIIIIIIA GAGACGGAAT CTTGCTCTGT 
33651 TGTCCAGGCT GGAACGCAGT GGTGCAATrT TGGGTCACCA CAACCTCCGC 
33701 TTCCCAGGTT CAAGCGATTC TCCTGCCTCA GCCTCCTGAG TAGCTCGGAC 
33751 TACGAGTGCA CGCCACCACG CCCGGCTAAT TTTTGTATTT TTAGTAGAGA 
33801 CAGGGTTTCA CCATGTTGGC CAGGCTGGTC ttgaactcct GACCfreriGA 
33851 TCTGCCCGCC TCAGCCTCCC AAAGTGCTGG GAHACAGGT GTIGAGCCACC 
33901 GCGCCCTGCC TATATtAQGA CTTTTATATA AGCTATCTCT AGCTAfiCTAiS 
33951 CTAGCTAGCT ATAATGTITT TTGAGACAGA GTCT6ACTCT 6TCACCCAGG 
34001 CTGGAGTGCA 6TGGCGTGAT CTCGACTCAC TGCAACCTCC ACCTCCTGGG 
34051 TTCCA6TGAT TCTCCTGCCT CAGCCTCCCG AGTAG CT66G AHATAGiGTG 
34101 CATGCCACCA CGCCCAGCTA ATTTnTGTA TmTAGTAG ACCAt^TTTC 
34151 ACCATGnGG CCAGGCTGGT CTCGAACTCC TGACTTCAAG TGATCCACCC 
34201 GCCTCGGCCT CCCAAACTGC TGGGATTATA AGCATAAGCC ACtGTCCCCA 
34251 GCTGCTCTCT ATATTTTTAA TACATATTAT TTCCATTAAT TTTCACAGCA 
34301 GTTCATTTTA TAGATGAGGA AACTAGGCCA GAGAAGTAAA ATATCTTQCC 
34351 CAAGATGATG TAACTAGTAA GTGGCAGGAT CAAGATTCAA ACCAAGCAAT 
34401 GTTCAAACCT CnGGAAQCA AGAATGTGGC CACTGTG6AA GGTGCAAGGC 
34451: CTTGACAACA AGAATAGGGA AAAGAAGGAA CTAGAAGGAA AGAGATGGCA 
34501 TGGGCTCAQC AGGCCAGGGA GCTCTTAGCT GTGTGTSTTG^QSAAGGTCAG 
34551 AAGGGAGGAA GAGGTTGTCT GTGCAGGTAA GTCCTGAGAA O^y^CCAGAC 
34601 TnTGAGAGG TGGAGCTTCA TAGCC AGGTC AHAGGGG AG AAGGGAGCTA 
34651 TAGAIIIIII llllllllll llllllllll 1 1 1 1 1 1 1 lAG AGACGGGGTC 
34701 TTACTATGn GCCCAGGCTG GTCHGAACT CCTGGGCTCA AGTGATCCTC 
34751 CCACCTCAQC CTCCCAAAGT GCTGGGATTA GAGGCATCAG CCACCCCGCC 
34801 CAGCGAGCTA TG6ATCTAAC ATGTACATCT TACACAGTGC TAATAGAATG 
34851 TTGGGTTTCT TCCCCAATAT TTTATTnGA AAAAAAATTC AAATATATAG 
34901 AAAAGnCAA AAATGTAGH CAAAGAACAC CTACATACCT TTCACATAGA 
34951 TTCATGATTT GTTAATGnA TGCCACTTTG TATATATCTC TCTCCCTCCT 
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35001 ATCTGTATAC TTTTATTTAT TTATTTTTGC TGAACTATTT CAGACTAACT 
35051 TAAAGQCATC TTGATnTAC CCTTGAACAG nCAATATGT TTCTGCTAAG ' 
35101 AAHCTCCTA TATAAGTCAG ATATCATTAC ATCTAA6AAA ATTCACGGCA 
35151 ATTTTACAAT ATAATATTAT AG TCCAAATC CATATTTCCT CAGTTSTrCC 
. 35201 AAAAAAT6TT CATGGCTGH TCCTTTmA ATCT AMm fiAATGCMGT 
35251 TTGAGGCAn GTA7TTGGTT GCTGTC TCTC TAGGGTTT7T AAAATCTSTG : 
35301 CCTTTTCTTC TCCCCATGAC TTTTTAGAAG AGTC AAGACC GGTtATTCtt 
35351 ATAGAATAAC CCACATTCTA GATTTGCCTG ATTAGmTT TTATACTTAA 
35401 CGTATmTG GCAAGAACAT TACATTGGTA ACGCTGTTGG TGATGGGTCA 
35451 GTTTTGAAGA GTGGA6ATGA TTAAACTQCT TTTGTTCAn GAAGTAJCTG 
35501 TCAAGACCAG AGATCCTTAA CTGGTGCCAT AAATAGGTTT CAGAGAATCC 
35551 TTTATATATA CACCCTGTCC CCCACCTAAA nATATACAC ATCTrGTTfA 
35601 TATATTCATT TtTCTAGGGG AGGCTTCTTG GCTTTTATCA AATTCTGAGA 
35651 GGGCCCCAA6 ACCCAAAGAG GTTATGAAAC ACTAGTCT6T CCACtGAGGC 
35701 AG6CAACACA GAGCTGGTTT CTGGGGCCn GTTCAGTCTG AACCAGCHC 
35751 CCTTGGGGAG ATAQCACAA6 GCTGTAACTT TGCCCCATCT TQGCT7TGGA 
35801 TCAAAGAGGA CTGTrCCATTT TGTTGTCATA CCTAGGAACC AQGGACAGCT 
35851 TATGTGGCCT GGHCCAGGG ATCCAGGAGA ATTTCASTTC TTGTCTTGCC 
35901 TTTCAGGTGT TCAGAATGCC AGGATTCCCT CACCAACTGG JAQATGAGA 
35951 AGGATGGGAA GCTCTACTGC CCCAAGGACT ACTGGGGGAA GTTTGGGGAG 
36001 TTCT6TCATG GCTGCTCCCT GCTGATGACA GGGCCTTTTA TGGTGAGTGA 
36051 ATCCCTTCAT ATCTGCCCCT CTTGGTCTTC AGAGTCCATT GACAG7GCTT 
36101 CCAGTTCCCT GTGQCCTGTT AATCTTTTAG TCmCCATC AGCCAGGGCA 
35151 TCTCCCTTTA nTATTCATT CATTCAACTA GCAGGTATCA ATTGAGCACC 
36201 TACTAAGtGA MGGTAAGAT CCTTCCCTCA AAGACHAAT AOT 
36251 TGG6AGTGGG A6GAGAGQCA GGCAGAGAGG AGACACAATA TAG7TGGATA . . 
36301 AGGACCTCCA AGGAGAGTGT TACAGGCTGA GAGGAGGATA tACTTAGGTT 
36351 GTCTTTAGGG AATCAGAAAA GGAGACTCTG 6AATAGGCTG GCAGAGAGAG 
36401 GGGCTACCTC OATACCTGC TCTGGACAAA C GACTUAA G CATAGTGACA 
36451 6ATTTGCCAA CCCTGTATTG 6AAGAACTGA TCTTmTAG TGGGGAtGAT 
36501 TACTTCTGGG 6ATTTCTTCT CATAACTGAG ACCAAAACAG TTTTGTGCAG 
36551 TCTCAGAAAT GACAGGAGGT ACCAATCTGA CACnCCTTT GGAAGCTCJA 
36601 GGGCAGAGAG TGAAAGAGTG GATTTTGACG GGGGCCTTGC TTGGAGGTCA 
36651 TTCACCCACC CCT6TCCTCA CTCCAGCAAC AGTGATAACT CACTTCCTTC 
36701 CTCCCTTTGT ACACCCTTCT CCCCACCTGC TCACAGGTGG CTGGG6AGTT 
36751 CAAGTACCAC CCAGAGTGCT TTGCCTGTAT GAGCTGCAAG GTGATCATTG 
36801 AGGATGGGGA TGCATATGCA CTGGTGCAGC ATGCCACCCT CTACTCGTAA 
36851 GATAGTGGTC CTTTGTCTAT CCTCTCCCAT ATAA6AGTGG CTGGCISSGfiA 
36901 GGGACAGTGG CAGGSTGAGT TGGGCAGAAG GAGTGTrAGG GTAGTCAGAG 
36951 CATTGGATTC TTACCACAGC AGTGCTCTTA ACCAGCTCTT TAACTTCTAA 
37001 GCAGAATGAt TTACACATGr CTCTACCm" WCCnACC MCCTTC^ V 
37051' AT6TCTTCAC TCTGCCCTGC AATCCTCCCA GTGGGAQGCA CTCtTCAAGG 
37101 ACGATCCCAG AACATTAAAG TCAAAGACCC CTTAGAQCTC ACCCTGTCCA 
37151 ACCACCTTGG TTGATAAAAG AAGTCAGCCT GGGQCCCATG GAATAGAATA 
37201 GTACAAGGGC AAGGTTCTCA TTGTGAGTCA AAGGTAGAGT GAAGAGAACC 
37251 CAGACCATCT CACCCCAACC CAGGCCAGTG TTTTTCCAAA TATAOCACTT 
37301 GCTGCAGATC TAGCTCAGCA CCCCCAGTCC CAGCCCACCC TGA6AACCCA 
37351 GGCTCCTCAT TCTGAGCAGC CAGCTA6AAT CATGACAAAG AGGGT(3GTAG 
37401 TGAGACTAT6 GGTACTGHG CTTAAAGCCA CATGGTGCAG TGGtTGCTGG 
37451 GGGGCTTCTG TGTGGGACTC TAGCATCTTA HCCCCCCTG TGCCCTCTCC . 
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37501 CCA6TGGGAA GTGCCACAAT 6AG6TGGTGC TGGCACCCAT GnTTGAGAGA 
37551 CTCTCCACAG AGTCTGTTCA GGAGCAGCTG CCCTACTCTG TCACGCTCAT 
37601 CTCCATGCCG GCCACCACTG AAGGCAGGCG GGGCHCTCC 6TGTCCGTGG 
.37651 AGAGTGCCTG CTCCAACTAC GCCACCACTG TGCAAGT6AA AGAGTAAGTA 
37701 TnTGAGAAC CCTTCAGCAG GGGTTCTTGA GCAGA6TCTG TAAATGGGCC 
37751 TCAGAGGGCT TAGACCTCCA AAGTCTCAT6 CAGAACTCCC TTTAlTCTCA 
37801 TCTCATATCT TTCTCCTGGA CCCCAaATG CTGfAACCGT ACCTGGGCa 
37851 TGGCACHAC TGTTCTCTCT GCCCAGGCTA CTTCCTACCC GATACTTAAG 
37901 GCAAGAATCA CTCACCTTTC AGGTGTCAGG TTTCAGGTCA TGT7TGCTCT 
37951 TT6AAATCAT CTGGdTGAT TAtGTGTATT AGlTGnTAT CTTCTATCCC 
38001 CTCCACTAGA ATGTAAAnC CAGAAGAAAC TTGCTGTCn ATTCAGTQCt 
38051 GCATGCCCAG GGCTTGGAAG AGTACCTGGC ATATAGTAGG AGTTGATT6A 
38101 TrAnATTTT GTCAGTCGAG AGAATGAATG GA6AAAAT6T GGTCCATGGC 
38151 CCAAAAGAAG TTAAGACCCT ATCCTAGATT CAGQCCAGAG ACCA6ATGGA 
38201 GAAAGAGTCT GTGTCTATCT AATACCAGTA ATGTCGTACC TCTGGCCGCT 
38251 TACCATGTAA ATAHGAHG TGTATCTACC ATGTISTTGGA CACTAGGCTA 
38301 GTGCTTGCAC AQCAGGTGAA A6ATACTAGA GTrTGGGAAG JCAGGAQGAG 
38351 CtAAGGtCTG TTCTACAACC TTATTAGATG-AAGAGGAGAG GGAATTGTGT 
38401 TCAGGGCAGA GGGA6AAGCA TTTCTCCAAA AGTAG6AGTC TTAATCATGr 
38451 CTGATGTAGG TT6AGTGTGG CCAGAAAA(3G GQCTGHAAG TATAGAQ(36C 
38501 CTGGATTATG AAAATCCAGC AGATCCAHG AGAGTTrAAG CASCAAGGTG 
38551 TrGTGACCAA GTTAACATTT TAGAAQGATC ACTGGTATGG AGGTrGGATT 
38601 GGAGAGGG6A AAGCCTAAAG GTTATAGAGAC TAGTTAGGAA GCTATTSTAG 
38651 GCTGGGCATG GTCGTTCATG CCTGTAATCT CAGCACTTTG GGAQSCT6AG 
38701 GTGGGAGGAT tGCHGAGGC CAGGA6TTGA AGACCAACCT GGCCAACATA 
38751. GCAAGACCCC GTCTCTGTTr TfCTTAATTA AAAGAAAA6T CCAGACGTAG 
38801 ACATAGTGGC TCACQCCTGT AATGCCAGCA CTTTGGGAGG GCAAGGTGGG 
38851 CAGATTGCTT GAGGTCAAGA GTTTGGGATT AGGCCAGQCG CAGTCGCTCA 
38901 CGCCTGTAAT CCCAGCACTT TGGGAGGCCG AGGTGG6CGG ATCACAAGGT 
38951 CAGGAGATCA AGACCAtCCT GGCTAACACA ATGAAACCCC GtCTCTACTA 
39001 AAAGTACAAA AATTAGCCGG GCATGGTGGC GGACGCGTGT AGtCCCAGCT 
39051 ACTCGGGAGG CTGAGQCAGG AGAATGGCGT GAACaAGGA GGCGGAGCTT 
39101 GCTGTGAGCA GAGATCACGC CACTGCACTC CAGCCTGAGC GACAGAGCGA 
39151 GAaCCATCT CAAAAAAAAA AAAGA6TTTG GGATTAGCCT GQCCAACATG 
39201 GCAAAACCCC ATCTCTACAA AAAGTACAAA AAAATTA6CT GGGTATGGTG 
39251 GTGCGCGCa GTAATCCCAG HACTCAGGA GGCTGAGGCA TGAGAAHGC 
39301 TTGAGCCTGG GAGGTGGAGG TTGCAGTGAG CCCAGATCAT GCCACTGCAC 
39351 TCCAGCCreG ATGACAGAffT AAGATGCCAT CTCAAATAAA AATTAAAAAC 
39401 AAAGTHAAA AAAAAAATAG AAQCTATTAC CGTGATCCAG GTAAGAGATG 
39451 TGAATAACTA CAAT6ATGGA AAGAAQGCAG AGTTCTTAGA GATGGGAGTA 
39501 GGA6AGATGA GGGAACTCCA 6ATTGGGAAG ATGATGTTCA AGTfTCTGGC 
39551 TTAGGCCACA GG6TGAGTC6 CAATTCCCTT GACTGAGATG GGGCATCCTG 
39601 GAAAAGGTGT TGCCTTTCTG TGTGGGTATC C7GGGCCCCT TAGQGGCCAC 
39651 TGGTGGCCTG GGACCTG6TA AACCTTCCCT GCACAAGCA6 AATTGGTCAA 
39701 GCAGGI I i 1 1 AGGACATCTT TACCCTGCCT CAACTCHGT CTGQCCCAGG 
39751 GTCAACCGGA TGCACATCAG TCCCAACAAT CGAAACGCCA TCCACCCTGG 
39801 GGACCGCATC CTGGAGATCA ATGGGACCCC CGTCCGCACA CTTC6AGTGG 
39851 AG6AGGTAGA 6TGTGTGTCT AATCTGTCn GTG AGGGTG G GACATGGAAC 
39901 A6ATCCTCTG GGAAATCAGG CTGTAGCCTT TACCTTITCC TACCCCCAGC 
39951 CCATCTCTTT GTCTTAGCAT TGAGCCTGT6 ACCACTGGTG ACCTATTTCA 
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40001 GCGTAACA©3 TTCCCAGGGT AGCAGGGAT6 6FGATGGAC GGGAGAGCTG 
40051 ^AGGATGCC AGGCAGAGGG CACTGT6AGG CCACTGGCAG CTAAAQGCCA 
40101 CCAmGACA AGTTCAGCAC TGGCCACACT GTGCCTGAGT CATCTQGGTT 
loiS^CAToS GGCCTGGGAT GGQGCAGCCT^^ 

loPOl Sg^CaSg ffSGAGGATG CAAmGCCA tSACGAGCGAG ACAOTCAGC ■ 
40251 tSndA aScCC GTCTCCa\AC QCCT66ACCA GCTCC^^^ 
&nin\ PAGG^CGGC TCGCTCCTCA CATGCAGAAT GCCGGACACC CCCACGCCCT 
1S351 Sc^ KaAGG AGAATCTGGA GGGGACAaG AGGAGACGTT 
40401 CCCTAAGGTG CCACCTCCCA CCCTGQCTCT.GTTCTCTCCT ATGTCTGTCT 
40451 CTC^Am aTGAKTGG CmCAGAAG CCTGCAGAGT TAGGAAAfiGA 
40501 aStggc CAGGGACAGA CTATGAGGAT TGTGCTGACC CAGCTGCCCC. 
ISisl TGtStC ACAGmACA GCCAGAGCCT 6TGCG6ACCC AQCTCTCTGC 
40601 CAG^CT TAGAAACCTG AGAGTCAGTC TCTGTCCAa GA^CTAA 
40551 GCTGGACAGG AGGCAGTGAT GCTAAACCCT GAAGGGCAAC AT^CTATG 
Tml Skat SgAGCTCAGA GCCTGGAGTA CGGGCA^GA TAGGATTGM^ 
40751 TAAATTCTGT AGAAA6ACTT TGAAAACAAT AAAGGAAAAG AT6AAT6AAC 
40801 GIMI 1 lA ScTTGAGGG ACCAACAACC CCCAAACCCC AGATTCTgC 
40851 AGGTCCATGG GGAAGGAGAA GTTGCCTTGA GTGGAAGCCC CAAGTAGGGA 

J go SaS^Stcaagag^ct^:C^^^^^ 

4nq';i TACCCTACTG GGGCHCAGG CTGAGCTCCT CCCHCACAA ATCACTTCAT 
JSl CTCTCT^QC CTffmCTGC ATCTGTGACA TAAGATGGTA A6ATAAAGGT 

S§?1 gStgtctca CCMHAIGT AAGGATTAAA tgtggaaaag gacawast 

41101 TGTATAGTGC TGCCATAGGG ACAGIGHCA GTAAACGTGA CACATTCTTA 
41151 CTATCACTM GAATC^^ 

. 41251 TTTGAGACCA GCCTGAGCAA CATAGTGAGA GACTCTCTCT ACAAAMAAA 

41301 AATAATAATA ATAATTGTIT 7TAATTA6AT GGGCA6GGCA CTCTGeCTLA 
41351 CACCTGTAAT CCCAGCACH TGGGAGGCCA AGGCCGGAGG ATTCCTTCAG 
4 401 G^SctT eX^GC CTGGGCCACA nCCTGTCTC JACAAAGMT 
Ai A^l AAMAAGTTA ACTGGGCATG GTGGCACATG CCTGTAATCC CAGCTACTCA 
4 501 aSSS^TCAG ^Stf KCTGAGCCC AGGACTTCAA. GACTGWGTC 
41551 A^tTTGATC ACACCACTGT ACTACAGCTT GGGCAACAGA GTGAGACCTT . 

4 60 GTCtciMAA AAAAAAGTTT GTrmTTTT ATqCACTCTC CTCACCAA^ 
41651 AAACTGAGTA AGHAGAGCC CTCTCAGCTG GCATGIGHG GAAACAGT^ 
41701 CCTCTCATTA MGTGCTGCC CTCACTCCCA TTGCCTCTTG GCCirGGTCA 
41751 CTATG^GAA A^ACTQGGA GGCAGGGCAA CAGAGGGCAG GGAAGAGCTA 
J 80l SSre Si^GGAAAA ^GATTT G^^f ^A^CT 
41851 AGAGCCACCA TGCAGAGGAG GGGGGCAGCT AGCCTTGTGT GPTCTCCTOG 
41901 GCATGCTCAG (y^GGAGQCAG AGCA^^ 

AlQ^l GGtCGGGACA AGCCAAGAGC CATCCAGCGT CAGTGCTCTC TGGGiTAG^^ 

5 MctK^gSw CCAGAGAGAA AGHCGCAGG qCTOTCACC 
i2051 TGCAGTGCT6 TGGACTTCAA CCncnGTT CCTTCTTCA6 JAACTGAAAA 
42101 TAACAGTCAT TGACCATGAC TATTATCGAC CGCTTTTGAA AAFGTAAACA 
42151 TAGTGACm ATTGCTGTAA AAATCATACG TGTTTATCAT CTTAAMTTC 

KSS SicAGGTACA AAGATGTGCA AJATATCATC CAAm 
42251 TTTGCTGGCC AGGCACGGTG ^[C^CT GTAATCCCAG CACm^ 
42301 GGCCGAGGCG GGCAAATCAC TTGAGGTCAG GAGTTTGAGA CCAgCTGGC 
42351 WmmG AAACCCTATC TCTACTAAAA ATACAATAAT TAGGCTGQGC 
tm\ JSaCTATAA TCCCAGCACT TT^GGCC ^GGKG 

42451 AATCACAAGG TCAQGAGfTn GAGACTAGCC TGGCCAATAT G6TGAAACCC 
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42501 CATCTCTACT AAAAATACAA AAATTAGGGC CGGGTGTGGT GGCTCACGCC 

42551 TGTAATCCCA GCACTTAGGG AG6CCGAGAC AGATG6ATCG CGAGATCAGG 

42601 AGHCGAGAC CAACCTAGCG AACATGGT6A AACCCCATCT CTACTAAAAA 
42651 AATACAAAAA TTAnCGGTT GTGGTGGCAC ACGCCTGTAA tCCeAGCTAC. 
42701 . nGGGAGGCt GAGGGAGGAG AATCTCnGA . ACCTGGGAGG CAGAGGTTGC 
42751 AGT6AGTGGA GAtCCCGGGG treCACtCCA GCCTGGQC6A CAGAGTGAGA 
42801 CTCCATCAAA AAAAAAAAAA AAAAAAAAAA AAATTAGCCG iSGCGTGGTGG 
42851 CGTGCACCTA TACTCCCAGC TACTTGGGAG GCTGAGGCAG GAGAATCGCT 
42901 TGAACCTGGA AGGCGGAGGT CGCAGTGAGC eGAGATCGTG CCATTGCACT 
42951 TCAGCCTGGG CGACA6AGC6 AGACTCT6TC TCAAAAATAA TAATAATAAC 
43001 AATAACTAQC CGGGCCTGGT GGCACATGCC TGTAGTCCCA GTTACTCAGG 
43051 AGGCGGAGGC AT GAGACTCA GGTGAACTAG GGAGACAGAG (GTtGCAGTGA 
43101 GGCAAGATCA CACGACTGCA CtCCA^G GTTGACAGAG CGAGACTCTG 
43151 TCTCAAAAAA AAAAAAATCC CATTTGCTCA 1 1 1 1 1 IGGAT ACTAGTATAA 
43201 CTATCACTCT AAACCAGm GTACTTAAAT CAAGCAGATA TGGGAGATCG 
43251 TGAATTACCA TCTACAGTGT TGTCATATAT GTCACATACT GAGCATTATC 
43301 AGCTAGTAGA ATCTAGTTAA TTGTrCTATG tGTGATGTAT GCAGAGTTCC 
43351 CATTTTGAAT GTGTTTTTAC TATGC7TAAA TAAATGACT6 ATGTCAQCAA 
43401 CCdCAAAAtG ATACATCTGA TGTAAGAGCC CCT6TTCCCC AATAATAACA 
43451 TCTAAACTAT AGACATTGGA ATGAACAQGT GCCCCTAAGT TTCeTCCCTC 
43501 CAGGGTrrCT TGGCCGGTCT CTGAGGACTA CACATCCCTA CTCCCGTCTT 
43551 tCCTCATCTT CAGGCGCAGT AACAGTATCT CCAAGTCCCC TGGCCCCAGC 
43601 TCCCCAAAG6 AGCCCCTGCT GtTCAGCCGT, 6ACATCAGCC GCTCAGAATC 
43651 CCTTCGnGT TCCAGCAGCT AHCACAGCA GATCTTCCGG CCCTGTGACC 
43701 TAATCCATGG GGAGGTCCTG GGGAAGGGCT TGTTTGGGCA GGCTATCAAG 
43751 GTGAGCGCAG CmCmJG CTTTGCTCTT CTGCCCCCAG TCCCTCTGTC 
43801 ACTGTCTrrC GGGGATTTCT CATCACTTGG CCCCACCCCA CACCATGCAG 
43851 GATGCCAGGC aCCTtCCTG GCTTrGGGTG TTGGTGTGAG AGGTATCCTT 
43901 CACCCCCACC CAGGCCACCT AAGGTCAATG TrGCTGTTAC AGTGAGCTTG 
43951 TG6ACCTGGA GATCCAGGTT GGGTTGAQCT 6TGCCTGTGG CCCTCCTGCC 
44001 TCCAGTCAGT GGGtGTTTGT TA6GTGCCTG CAGACCTCAG TACCGGGCAT 
44051 GCTACAAGGA GCACACAGGG GAATGGCTCC TGCCTCCCTG GTGAACAGTC 
44101 TCAGGGACTA ACCTCTCTCT TTCTCTCCtC CTCCTCCTCT TCTGCTGAGA 
44151 ACT666AQGG GGGGTCAGGT AAGACGTGTG TCTCAGCTTG GGQSCAGCAG 
44201 GGCTGGAGAG CTCACCCgCG ATCCACCCAG CTCCCTGGTG CATGTCTTtG 
44251 GCACTGACCT TCCTGCCCa AGACTTCTGT TCACTCAGGA GACTCACltC 
44301 TATGCCAAAT GACCAGAGCC CCTGCnGQC TTGGCAGCAT CCCCTCCTGC 
44351 CncnCCCC ACTTCCCTTT TCTGGGTTCT TGCCTGTCCT CTGTGCATGC 
44401 CGAGCTCTCC AQGAAAGAGG GTTTGCTrCC GTGTGAGTCC CATCTTIGCTC 
44451 CACGCTGCAT CTTCCACACA TGAACTCTGT CAHCTGACC CGQCTCAGTG 
44501 TGGCGTCCAA GGGATGG6AT GGCCAGCTCe ATAGATTTTC TCAAACAGTt 
44551 CTCCAGAACT TCCTCTGGTC TCAGCACCAT. TAACAGTCAC CCtCCCTGTA 
44601 GGTGACACAC AAAGCCACGG GCAAAGT6AT G6TCATGAAA GAGTTAATTC 
44651 GATGT6AT6A GGAGACCCAG AAAACTTTTC TGACTGAGGT AAGAAGATSG 
44701 AGGGGGCCCG GGAGGTTGGT GTtACCATTG GAAGAGA6AA GACCTTACAA 
44751 ATAATGGCTT CAAGAGAAAA TACAGTTTGG AAHACTGrC TTAAAGACTA 
44801 AGCAGAAAAG AGCCCTAGAG 6AATATCCCA CTCCCTCTAA ATTACAGCGT 
44851 AAnATTTGT TCAATGAACA CTTACTAAAA GCAACACAAA CAGGGTACAA 
44901 GGGATGCAGT AACAAAAGAT ACAGGGTTCA 6AAGAGCTCT CAGGTTATBA 
44951 GGATGATGGA CATGAAAACA CTCCAATTTA GTACAACTCA ATGTTATAAT 
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45001 CCTCACCTGA ACGCCCTGCT AAGG6AGCCT GGAGGGGAGC TCCCTGAGCA 
45051 CTCACACTa nGGGCATTT ACAGTmCA CTACCCCTCC CAAGTTACTT 
45101 CATG6AGTAA CTTAAGTTGG GGACACCTGT GGTCTGGGTA TTGCCCTCCA 
45151 AGCCACnGG CCACTCCCAC CCCAffTTCTC CCAATGCAGT TCCAAGGGTA 
45201 AGGCCTAT6A AGCCATCTCC ATCTATATGG TGGTGGTCTr CCCtCATCCt 
45251 GATCTTAGTG CCCTGTCAfA TCACAA6ATA GGAGSTAQ6A GATACAffiTG 
45301 GTAACACTTG TCAAGCTGAT TCCTTGGAGG GAAGAGGTAA GGAA6ACAGT 
45351 GAGAAGTTAA CCACCAGCTT TCCTTGGCn CCCCCACCCC CAGGT6AAAG 
45401 TGAtGCGCAG CCTGGACCAC CCCAATGTGC TCAAGTTCAT TGGTGTGCTG 
45451 TACAAG6ATA AGAAGCTGAA CCTGCTGACA GAGTACATT6 AGGGGQGCAC 
45501 ACT6AAGGAC TTTCTGCGCA GTATGGTGAG CACACCACCC CATAGTCTCC 
45551 AGGAGCCTT6 GTGGGTT6TC AGACACCTAT GCTATCACTA CCCTAGGAQC 
45601 TTAAAGGGCA 6AQGGQCCCT GCTTTGCCTC CAAAGGACCA TGCTGGGTGG 
45651 6ACTGAGCAT ACATAGGGAG GCTTCACTGG GA6ACCACAT TGACCCATGG 
45701 GGCCTG6ACC ACGAGTGGGA CAGGGCTCAA CAQCCTCT6A AAATCATTCC 
45751 CCAHCTGCA GGAtCCGnC CCCTGGCAGC AGAAGGTCAG GTTTGCCAAA 
45801 GGAATCGCCT CCGGAATGGT GAGTCCCACC AACAAACCTG CCAGCAGGGC 
45851 GAGAGTAGGG A6AGGTGTGA GAATTGTGGG CHCACTGGA AGGTA(3AGAC 
45901 CCCTTGCTAT GCAACHGTG TGGGCTGGGT CAQCAQCTAT TCAHGAGn 
45951 TGTCTGTGTG ACTGAAACTG ACCCCAGCCA ACtGTTCTCA GTTCACAGCC 
46001 CTGTTTTCAA AGAATTACAC ATCTCTAAAG GCAAACAGG6 CACGGACAAG 
46051 GCAAACTG6A GAGGCAAACT GTAGCCTGAG ATQGCCtGGG CTTGCCATCA 
46101 CAGGTAHCA G6TGCT6AGG GCCCHAGAC CAAGTAGAGC ACCTCACTGC 
46151 CTAGGAAATC AATGAAGGGG AAAT6AGTTC TAQCGGAGCC CTGAAGGATC 
46201 AGAATTGGAt AAAGTTCnA TTGGCAGAGA GGCACCAGGA FGAAGTGAC 
46251 AGGAGCAAAG ACCTGGGAGG AAAGAQGAGA AAATCATCTA TTTCAGCTGG 
46301 AAACAAATGA HCCAAGCAT AGAAATAATA ACAGCTSACA AGTACTGAGT 
46351 GCCCTCTATA TGCTAGGCAC TGGGCTGAGG GATTAACAT6 CATGTGCA7G 
46401 TTTAnCCTC AT6ACAACCT TGGTTTCCAG ATAAGCTGGA CTGGAAAGGG 
46451 ACA6AQCTGG GATCCTGGGC TAATCA6TCT 6GTCGCCAA6 CCTGAGACH 
46501 TAGCCACTGC CCTTCACATG GGG6TCCATG AAAATAGTAG TAGTCTGGAA 
46551 CAGTTTGGGG GTACATCAAG GTC GCTGTGT TTTAAGCTAT GGASTCT GGA 
46601 CTATAQGAGA CAAATGTAAA AGAtillllll GGTTGACTGG CTmTGGTT 
46651 llillGIIIG-4-ll(alllGTT TGTr TGTTTG 1 1 Itil I IGH T TTTCCT Sn 
46701 TCTGGGGCTT GAATCAGGAA GGAGGimT TrGTTGTTGT TGTTTTGAGA 
46751 AAGGATATTG CTCTGHGCC CAGACTGGAG TGCAGTCGCA CGATCATGQC 
46801 TCACTACAQC TTCGACCTCC TGGGCTCAAG CAATCCTCGT GCCTTAG CCT 
46851 CCCAAGTAGC TGGACTACAG GTGTGTACCA CCACACCTAA TTTTTTGAAT 
46901 I II IIIIIC T llllllllll lllllllill GGHTAGAGACA GGTTCTCACT 
46951 nGTTGCCCA GGCCTGAATC TCAAACTCCT GGGCTCAAGC ATTCCTCCTG 
47001 CCTCGCCCTC CCAAAGTGTT GGGATTACAG nGTGAQCCA CCATGC CCGG 
47051 CAGGAAAAGA mTTAAGCA AGAAAGCTTA AGAGCTGTGG TTTT TCCAAA 
47101 ATGAGTCTGG GCTGGCACAG TCQCTCATGC aGTAATCCC AGCALlllli 
47151 TGGGAGGCCG AGGT6AGTGG ATCACHGAG GTCAGGAGH TGAGACCAGC 
47201 CTGGCCAACT GGT6AAACCC CTGTTTCTAC TAAAGAAAAA AATGCAAAAA 
47251 nAGCTGGGC GTGGTGGTGC ACGCCTGTAG TCCCAQCTAC TCAGGAGGCC 
47301 GAGGCAGGAG AATA6CTTGA ACCTGGGAGG CAGAAGTTGC AGTGAQCCAA 
47351 GATCACACCA CTGCAnCCA GCCTGG6TGA CAGACT6AGA CTTCATCTCA 
47401 AAAAAAAAAA AAAAGA6AGA CTGATATGGT TAGTACAHG GGGTGGAATG 
47451 CGGAGGGTCC AGG6AATGGA GCCCTGCATA GGGGGCTAAT GAAACATTTC 
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47501 AGATTTCTGA ATTAAGGTAG TCGCTGTGGG GACAGGAGCC TGGGAGGCAG 
47551 GGT6GAGTCA GAATGGAGAG ACTGGTTGGC AAT6AGGGAA CAGGAOSAGG 
47601 AGGAGGAG6A GTTACGAGTG GCTTGAGGTG TCACHACCA GACATTTGGG 
47651 GGATGGGG6A TAGCCGTGAT TBTTGAGCAA CTGGTTTGGG AA6AGCTAGC 
47701 ATTGATCCCT GCTGTTCTGT GCTAGCA6AA CCTATCAQCA TCfTCTGGGC 
47751 AGGAAACTGG CTCCATGAGA CTGGCnAGG GAGAGGCTGC TAGTCAeCtA 
47801 ATCTGCAGAG AAGGGQCAGC TGGAGCTGTG GGACAGAAGA GQCATCCATG 
. 47851 TAGCTGGTGG GGGTGTCTCA GCTTGTGAAG AGGAGATGQC TTTGAGCAGG 
47901 GCTGACACTG AAAAGQCTGG AAGAAAAAAA CAGACACACA AGAGTCTCAG 
47951 GATCAGGTAG CATAGGAAAG TrGTCGACAG TCTTTGAGGA GCACTCCCtC 
48001 AGGCAGGCAG GCAGGCAGGT CATGAGCTAT AGCGATTCAG GAA6AGCTCC 
48051 GTGGGTGtGT GAQCAGCTCG AGGAGCCTAA GG6AT6AAAG TA6TATTGCA 
48101. GGGGGCTGGA 6AGCAAGGAG TGGCTCCnC TACATTTGCA AGGGAAGGAiS 
48151 AAAGGAA6TT GCTCCT6AGA 6TGGTAAGAG. TCAGTGGTGG AQQCCTGGAia 
48201 AGGAGACATA ACAAACAAAT nGHGACAA ACATTTTGGr AQGAAGGiSGG ■ 
48251 AGAGCTTAAA GTTTAGACAG TGGdGAAGGT GGAGTCHAG AQGAQGrGAA 
48301 TGTCT6AAAG ACAGAQCtAG CTGGAGCAAG AAGTCACTrC TCTGTTCCAG 
48351 GCAGGAAGGA TCCAAAGTGG CTCAAGCCAG AGATTGGGAG AGTGGGGAGG 
48401 AGGGAGCAGC CTGGATCTAA GTAAAATGGG TA6AGGTGGA GGQGGTGCTG 
48451 CAACGGCCAG'GGTnrCTGA AGTTGGGGAC ATTAGGAGAG AGCT6TGA6G 
48501 GeTTTGGCCA GCCACTGTGC rAGTGATTGG TGAACCAAAG GATGGGCAGG 
48551 A6ATGGCAGC AG66AAGCAG AGGAAGTCCA <3GCTtCCTGT TGGTATTGGG 
48601 ACAAGGGAGA GGGCATAG6A GGCCCTGGCC CTGHGTCCA GGTTGGCTTC 
48651 TGAAGCTGGG TGGGGATGGC CTGGTAGGAG AGCATCTATG GCGCCCAATT 
48701 CCAGATTCAG GGTCTAGTTG ATITGCTGGC CCTGTAGCGT CAGCTCATiSC 
48751 TTCTGTTCCA GGCCtATTTG CACTCTATGT QCATCATCCA CCGGGATCTG 
48801 AACTCGCACA ACTGCCTCAT CAAGTT66TA T6TCCCACTG CTCTGGaCt 
48851 GGCCTCCAGG GTCCTATCCT TCCTGGCTTC CTTGTCACAA AGGAGGCT6A 

48901 CHGTCCCCT CTGGCTA6AG GGCAGAGGTG nGCCTAGGA GCTCCTATCT 
48951 TTCCCTTCCT GCTTC7TCCA ATQCCCTTCT CTGTCCTCTG GGAGCTCC6A 

49001 GACACACACA GACATAATTT CACCTTCTCT CATTAGCAAC CTTTGAAATA 

49051 ATTTGATTAG AAGGGACTTC A6AAGTTTGT TGACTATATG TAGAAAACCC 

49101 TGTCATnTAXCTGCTTTTG CCCCATACTA GTCHGTAAA ACAGHCAtT 

49151 GCTGACCCCA TnTACAGTG GTGGCACCTG AAGCCTCAGC CTGAGQCCAC 
49201 CGAGCTA6TA AATTTACAGG GACCAGTTTG AGACCAGCAT TCCTCCCACT 
49251 GCCCCTCAQC TGTGGTGGTT ACAATGTTGT TrGTCTTACT GACTTGCTAT 
49301 CTCGCnCCT GGGTGTCTAC CQGCTGGCCC TGGCTCTGCC CTCTAGACCC 
49351 ACACCACQCA ATCTTCATTC CTTTCCCACA T6ACTGCCCT GTAGCTATTC 
49401 AAAGAGCHG TCTCCCCCAA GTCTCCCCAT CTACTGGCTC CACCTTGCCT 
49451 TTTTCTGTCT TATCCTGGTT CTAGCCACTG CCTGAAATCA TTTTAGGAAT 
.49501 MGACAGGAC AGGGAAAAAC AAAAGCAACC CCCTGTCCCA CCTCTGAGTT 
49551 CCACTCTCCA AGTCCCtGAG CCTCACCTCC AGGGCTCCAG TGGCTCTGCC - 
49601 ATGAACCCAC TGTG6GCTGG GAGTCTGCTG TGCACAGATA CCAGACCCTC 
49651 AGAAAC ACAA ATGCCAAGTG TGTCTGTTTT TrTGTTTTGT TTTiEnTTtGT 
49701 TTTTTAGATiG GAGTCTCATT CTGnTCCCA GGCTGGAGTG CAGTGGTGCA 
49751 ATCTTGGCTT ACTGCAGCCT CTACCTCCCG GGTrCTAGTG ATTGTTCfGC 
49801 T TCAGCCTCC CAGTAGCTAG GACTACAGGC GTGTGCCACC ACGCCCAGCT 
49851 AAIIIIIIM llllllllll TGTATmTA GTAGAGACAG GGmTGCCA 
49901 TGTTGGCCAG GCTGGTCTTG AACTCCTGAC CTCAGGTGAT TCACCCGCCT 
49951 TGGCCTCCCA AAGTTCTGGG ATTACAGGTG GAAGCCACCG TGCCTSffiCT 
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50001 GACTGTGTCT ATTT6ATAGA GCTTTCTGCT CTGATTCTCC CHGCTATAC 
50051 ACCTITTCTC CCCTTCTCAG TGGCTTCTCT TGCCTATGCT TCCTCCCCAG 
50101 GGCCAGGTH GAGAACATCC CCATGAAGTC CTGACCTGTC TTTTATCCTA 
50151 CCAGGACAAG ACTGrGQTeG TGeCAGACTT TGG(KT6TCA CQGCTCATAG 
50201 TGGAAGAGAG GAAAAGQGCC CCCATG6AGA AGGCCACCAC CAAGAAAC^ 
50251 ACCTTGCGCA AGAACGACCG CAAiSAAGCGC TAGAGGGTGG TGG6AAACCC . 
50301 CTACTG6ATG GCCCCT6AGA TGCTGAACGG T6AGTCCT6A AQCCCTGGAG 
50351 GGGACACCCG CAGAGGGAGG ACAGATGCTG CCCTTGCATC AGAGCCCTGG 
50401 GAATTCCAGG GGAGGCCTGT GAAGC6TAGG ACCGGATACC CAGAGCTGAG 
50451 GATATTITTC CCTTGCCAGG TGGGGCCTCA C6ATTTAQCT CCTGAGCTCA 
50501 GGGGGCTGGG AACTGATCAG TGTCCCATCA TGGQG6ATAA GSTGAGITCT 
50551 6ACTGTGGCA TTTGTGCCTC AGGGATCQCT AAGAGCTCAG GCTATTGtCC 
50601 CAGCTTTAGC CITCTCTCTC CATGGTGAGA ACTGAAGTGT GGTGCCCTCT 
50651 GGTG6ATAAT GCTCAAACCA ACCAGAGATG CTGGTTGGGA TTCTrGAAAT 
50701 CAGGGnGTG AGGCCTCAGA AATGGTCTGA ATACAATCCA TnTGGAGTC 
50751 TGAGGCCCAG AGAASTTCAG TGAATTGCCT AGGAGCATAC AGCTGCCTAA 
50801 TGGCAGAGQC TAGATGAACC CTAGTCTGGT TCTTTTCCAC TTTAACGrTGC 
50851 AGTTTCATCC TAGQCAGTGT TATGTTATAA GGGCTCTCCA AQGCAGTTCA 
50901 CCTACGGCTG AQGAAGGACT ATTTTCAGGT GGTGTCTdCG CAQGACA(3CC 
50951 TGTGGGGTGT.CCCTACA6AA CCTGHCTAG CCCTAGTTCT TAGCT6TGGC 
51001 TTAGATTGAC CCTA6ACCCA GTGCAGAGCA GGTAAGGGAT GTAAACTTAA 
51051 CAGTGTGCTC TCCTGTGTTC CCCAAGGAAA GAGCTATGAT GA6ACGGTGG 
51101 ATATCTTCTC CTTT6GGATC GTTCTCTGTG AGGTGAGCTC TGGCACCAAG 
51151 6CCATGCCCG AGGCAGCAGG CCTAGCAGCT CTGCCTTCCC TCGGAACTGG 
51201 GGCATCTCCT CCTAGGGATG ACTAGCTTGA GTAAAATCAA CATGGGTGTA 
51251 GGGmTAtG GTTTATAACG CATCTGCACA TCTTTGC CAC GTTCGTCTtr 
51301 CATreGTCTT AAGAGAAG6A CTGGCAGGGt llllilGMI TAGA7GGAGC 
51351 CTCACnCGT TGCCCAGGCT GGAGTGCAGT GGCACAATCT GQQCTCAaG 
51401 CAACCTCTGC CTTCTGGGTT CAAGTGAHC TCCTGCCTCA GCCTaCAAG 
51451 TAGCTGGGAC TACCGGCACA CACCACCATG CCCQGCTAAT TTTTGfTATTT 
51501 TTAGTAGAGA CAGGGTTTCA CCATGRGGC CAGGCTQGTC TTGAACTCCG 
51551 GACCTCAGGT GATCCGCCTG CCTCAQCCTC TAAAAGTGCT GGAAH AATA 
51601 GGCGTGAGa ACCTCGCCCG GCCAG6TTTT 1 1 1 1 1 1 1 1 1 1 TTtiTAGTTG 
51651 AGGAAACTGA GGCTT6GAAG AGGGCAGTGG CHGCACATG GTC6ATAAG(3 
51701 GGCAGATGAG ACTCA6AATT CCAGAAGGAA GGGCAAGAGA CTGTTCAtGT 
51751 GGCTGTCTAG CTAGCTCHG GGCCAAATGT AGCCCTTCTC AGTTCCCTTC 
51801 AAGTAGAAGT AGCCACTCTA GGAAGTGTCA GCCCTGTGCC AGGTACCACG 
51851 TGGACAGAGT GAGGAATCTT GGAAAGA7TC CTACCTTTAG GAGITTAGTC 
51901 AGGTGACAGC ATATCTCAGC GACTCAAACA CACACACATT CAAAGCCTTC 
51951 tGTAATTCCT ACAAAGTTGT GAQGGGTAGA G 6AGAGG AGA GACAA6GGAT 
52001 GGTTAGGATA AT6AAG6AAT Glllltiilll TGnTTTGTT TTTGAGAJGG 
52051 AGTTTCACTC TGTCACCCAG GCTGGAGTGC A6AGGTGCAA TCTtGGCTCA 
52101 CTGCAGCCTC CGCCTCCCAG 6TTCAAQCAA TCCTCCTGCC TC AGCCTC CC 
52151 AAGTAGCTGG GACTACAGGT 6TGCGCCACC ACGCCTGGCT AATTTTTGrA 
52201 TTTTCAGTAG AGACAGG6TT TCGCCATATT GQCCAGGCTG GTCTCAAATG 
52251 CCTGAGCTCA GGTGATACAC CCGCTTCAGC CTCCGAAAGT GCTGAGATTA 
52301 CAGGGATGAG CTACGGTGGC TGGGCATGAA GGAAGATTTG TTTTAAAAAA 
52351 TTGTTTTCTT TAATATTAAT TGAACACCTC TCTTCAGAGC ACTGGGCTGG 
52401 TGCCAGAGGG THCAGACAT GAATCAGATC CAGCACCTCA TAGAGCCTTA 
52451 ATCTGGCACA CACACACAGG CACAAQGAGA CACAGAGAAG GCAGGGTAGG 

FIG. 3-21 



f 



U.S. Patent Jan. 22, 2002 Sheet 27 of 41 US 6,340^83 Bl 



52501 ATGAGTGGAA GCTAGGAGCA GATGCTGATT TG6AACAC7T GGCnCTGCA 
52551 GTGAAGCCCC TTCTTACTCC TCTTCAGTM CCCAGCTCTC AGTGGATACA 
52601 GGTCTGGAn AGTAAGATTT GGAGAGAT6A TTGGGGATTG GGGAGAGCTC 
52651 TCTAACCTAT TTTACCACCT CCTCTTCTGC CAnCTFCCT GTCCACATCC 
527G1 CCAGCATCCC TTTCCCTTGC CAAGTATGTG TGGCCTCTGT AGTCCTTTGT 
52751 AAACAGCTGT CTtCTTACCC TACAGATCAT TGGGCAGGTG TAT(3CAeATG 
52801 CTGACTGCCT TCCCC6AACA CtGGACTTTG iBCCTCAACGT GAAGCTTTTC 
52851 TGGGAGAA6T TTGTTCCCAC AGAHGTCCC CCGGCCTTCT TCCCGCTGGC 
52901 CGCCATCTGC TGCAGACTGG AQCCTSAGAG CAGGTTGGTA TCCTGCCTTT 
52951 TTCTCCCAGC TCACAGGGTC CTGG6ACGTT TGCCTCTGTC TAAGGCCACC 
53001 CCTGAQCCCT CTGCAAGCAC AGGGSTGAGA GAAGCCTT6A GGTCAAGAAT 
53051 GTCGCTGTCA ACCCCTGAGC CATCTSACAA CACATATGTA CAG6TTGGAG 
53101 AAGAGAGAGG TAAAGACATA GCAGCAAGTA ATCTGGATAG GACACAGAAA 
53151 CACAGCCATT AAAAGAAAGT TTAAAAGAAG GAAAHCACC CAAACCATTT 
53201 GAATACAGTA AGTGTATTCA TCTFTCGATA TTCCCCTGTC CATATCTAGA 
53251 CATATAC7TT TTnTATAGT AAATAGTTCT GT ATTTTG CC CTGCATTTCC 
53301 CTTlGTlGnTA CTATCCA6TC nCCTSTTTA TCATTTrrGT CGACAACATG 
53351 AAATTCTATT GAGAGACTGT CTGAACATAT TGTAATGTAG ATGnCAGGT 
53401 TTTrCCAfiTT TCTCnTACA ATAGGTATTT AACTACAGTG AGCAGTTTTA 
53451 TGCATTTAGC TAATTTCTCC TITGAGGAAG tATTTTCAAA ATTACCTTTA 
53501 TTCTTCtCAG GTAATAATTT CATTATTACC AAAGTTACCC TAGGTCltlt 
53551 CAAGTGTGTG GTTAAAAAAC GAGAATCTGG CTGGGCGCGA TGGCtCACAC 
53601 CTGTAATCCC AGCACTTTGG GAGGCTGAGG CTGGTGGATC ACCT6AGGTC 
53651 TGGAGHCGA GACCAGCCTG GCCAACATGG TGAAACCCCA TCTCTACTAA 
53701 AAATACAAAA CTTAGCCAQG GATGGTGG(>\ GGTGCCTGTA ACCGCAfiCTA 
53751 CTTGGGAGGC TGAGGCAGGA GAATTGCTTG AACGCAGGGG CGGAGGTTGC 
53801 AGTGAGCCGA TATCACGCCA TTGCACTCCA GCCTCGGCAA CAAGAGT6AA 
53851 ACTCTGTCTC AAAAATGGGG nCTTTTCCT GCCATCAAAA ATCAT6TTTC 
53901 nTTAAAAAC AAGTTCAAAC ATTACCAAAG TTTATAGCAC AGGAAATACG 
53951 TCTTCTGTAA TCTCCCTTAA CCAATATATC CCTCAACATT CTCCTCACCC 
54001 CCAACTCCAC CCTCCCAGGA TAACCAGTTG GGACATAATC mATTTAAA 
54051 AATGGTTTCC G6ATAGAGAA AGCGCHCGG CGGCGGCAGC CCCG6CGGCG 
54101 GCCGCAGGGG ACAAAGGGCG GGCGGATCGG CGGGGAGGGG GCGGGGCGCG 
54151 ACCAGGCCAG GCCCGGGQGC TCCGCATGCT GCAGCTGCCT CTCGGGCGCC 
54201 CCCGCCGCCG CCCTCGCCGC GGAGCCG6CG AGCTAACCTG AGCCAGCCGG 
54251 CGGGCGTCAC GGAGGCGGCG GCACAAGGAG ^CCCACG CGCGCACGTG 
54301 GCCCCGGAGG CCGCCGTGGC GGACAGC6GC ACCGCGGGGG GCGCGGCGTT 
54351 GGCGGCCCCG GCCCCGGCCC CCAGGCCAGG CAGTGGCGGC CAAGGACaC 
54401 GCATCTACTT TCAGAGCCCC CCCCGGGGCC GCAGGAGAGG GCCCGGGCTG 
54451 GGCGGATGAT GAGGGCCCAG TGAGGCGCCA AGGGAAGGTC ACCATCAAGT 
.54501 ATGACCCCAA GGAGCTAGGG AAGCACCTGA ACCTAGAG6A GTGGATCCTG 
54551 GAGGAGCTCA CGCGCCTCTA CGACTGCCAG GAAGAGGAGA TCtCAGAACT 
54601 AGAGATTGAC GTGGATGAGC TCCTGGACAT GGAGAGTGAC GATGCCTGGG 
54651 CTTCCAGGGT CAAGGAGCT6 CT6GTTGACT GTTACAAACC CACAGAGGCC 
54701 TTCATCTCTG GCCTGCTGGA CAAGATCCGG GCCATGCAGA AGCTGAGCAC 
54751 ACCCCAGAAG AAGTGAGG6T CCCCGACCCA GGCGAACGGT GGCTCCCATA 
54801 GGACAATCGC TAaCCCCGA CCTCGTAGCA ACAGCAATAC CGGGGGAGCC. 
54851 TGCGGCGAGG GGTGGTTGCA T6AGCAGGGC TGCTGGTGCG GGTGGCGaG 
54901 GGGTGTGnC CCaGCCGCC TCAGnTTCC AGTTTTGGAT Mill lAtTG 
54951 TTATTAAAGT GATGGGAGTT TGTGTTmrA TATTGAGTGT GGGGGAGGGG' 
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55001 CCCTTTAATA AAGCGAQGTA GGGTACGCCT nGGTGCAGC TCAAAAAAAA 
55051 AAAAAAAAAT GATTTCCAGC GCTCCACAn AGAGHGAAA TnTCTGGTlS 
55101 G6AGAATCTA TACCTTGTTC CTTTATAGGC CAAGGACCQC AGTCCTTCAG 
55151 TAACACCAGT GTAAAAGCTT GAGGAGAAAJ TGT6AAGGTA CACAGTATTT 
55201 GrnTCtAAT j«:eTCTTGTC AHCTAAATA TCmAAm 
55251 ATATATATAC AGTATTGAAT GCCTACT6TG TQCTAG6TAC AGrTTCTAAAC , 
55301 ACTTGG6TTA CAGCAGC6AA CAAAATAAAG GTGCnACCC TCATAGAACA 
55351 TAGAHCTAG CATGGTATCT ACTGTATCAT ACAGTAGATA CAATAASTAA 
55401 ACTATAHGA ATATTAGAAT GTGGCAGATG CTATGGAAAA AGAGTCAAGA, 
55451 CAAGTAAAGA CGAnGTTCA GQGTACCA6T TGCAATTTTA AATATKTCG 
55501 TCA6AGCAGG CCTCACT6AG 6TGACATGAC ATTTAAGCAT AAACATGGAG^ 
55551 GAGGAGGAGT AAQCCTGAGC IGTCHAGGC HCCGGGQCA GCCAAQCCAT 
55601 mCGTGGCA CTAGGAGCCT GGTCnTCCG ATTCCACCTT TGATAA CTGC 
55651 ATTTTCTCTA AGATATGGGA GQGAAGnTT TCTCCTATTG WTTAAGTA 
55701 TTAACTCCAG CTAGTCCAGC CnGHATAG TGTTACCTAA TCTTTATAGC 
55751 AAATATAT6A GGfTACCGGTA ACAHATGCC CATTTaCAC AGAGGCACTA 
55801 CTOGGTgJaG GAGmOCCT GACGTTATAC AACCAG6AAG TAGCTCAGCC 
55851 TAGATCCCTT CCACCCACCC CATGGCCCTG CTCAT6TTCC ACCTGCCTCT 
55901 MtTTACCTC TTTTCCnCT AGACCAGCAT TCTCGAAATT GGAGGACTCC 
55951 TTTCAGGCCC TCTCCCT6TA CCTGGGGGAG aGGGCATCC CGCTGCCTGC. 
56001 AGAGCTGGAG GAGTTGGACC ACACTST6A6 CATGCAGTAC GGCCTGACCC 
56051 GGGAGTCACC TCCCTAGCCC TGGCCCAGCC CCCT^^ S^^JJS? 
56101 CAGCCAGCAT TGaCCTCTG TGCCCCATTC CTGCTGTGAG CA^GCCGTC 
56151 CQGGCTTCCT GTOGAnGGC GGAATGTTTA GAAGCAGAAC AAGCCATTCC 
56201 TAHACCTCC CCAGGAQGCA AGTGGGCGCA GCACCAGG6A AATGTATCTC; 
S CACA^CT GGGGCCTACT TACTCTCTCT AAATCCAATA CTTGCCTGAA 

«;moi AGCTfiTGAAfi AAGAAAAAAA CCCCTGGCCT TTG66CCAGG AGGAATCTGT 
56351 CACCCAGGAA CTCCCTGGCA GTGGAHGTG GGAGGCTCTT 

56401 QCHACACTA ATCAGCGTGA CCTGGACCTG CTGGGCAG6A TCCCAQGGTG 
56451 MCCTGCCT6 TGAACTCTGA AGTCACTAGT CCAGCTGGGT GCAGGAGGAC . 
56501 -nrCAAGIGrG TGGAC6AAAG AAAGACTGAT QGCTCAAAGG GTGTTGAAAAA 
56551 GTCAGTGATG aCCCCCTTT CTACTCCAGA TCCTGTCCTT CCTGGAGCAA 
56601 GGTTGAGQGA GTAGGTTrTG AAGAGTCCCT TAATATGTGG TGGAACAGGC 
56551 WGGAGTTAG AGAAAGGGCT GGCTTCTGTT TACCTGCTCA CTGGCTCTAG 
56701 CCAGCCCAGG GACCACATCA AT6TGAGAGG AAGCCTCCAC CiaTGnTT 
56751 CAAACTTAAT ACTGGAGACT GGCTGAGAAC HACGGACAA CATCCTTTCT 
56801 GTCTGAAACA AACAGTCACA AGCACAGGAA GAGGCT^ 
56851 AGQCCCTQCC CTCTA6AAAG CTCAGATCTT GQCTTCTGTT ACTCATACTC , 
56901 GGGTGGGCTC CTTAGTCAGA TGCCTAAAAC ATTITGCCTA AAGQTCGATG 
56951 GGireiGGAG GACAGTGTGG CHGTCACAG GCCTAGACTC T6AGGGAGQG, 
57001 GAGTGGGAGT CTCAGCAATC TCTTGGTCTT GGCTTGATOG CAACCACTQC 
57051 TCACCCnCA ACATGCCTGG TTTAGGCAGC AGCTTGQGCT G^AAGAGCT 
57101 GCTGGCAGAG TCTCAAAGCT GAGATGCTGA GAGAGATAGC TCCCTGAGCT. 
57151 GGQCCATCTG ACTTCTACCT CCCATGTTTG CTCTCCCAAC TCATTAGCTC 
57201 CTGGQCAGCA TCCTCCT6AG CCACATGTGC AGGTACTG6A AAACCTCCAT. 
57251 CTTGQCTCCC AGAGCTCTAG GAACTCHCA TCACAACTAG ATTTGCCTCT 
5730 TclSic WGAGCTTG CACCATATTT AATAAATTg GAATG^ : 
57351 GGGGTATTAA TGCAATGTGT GGTGGTT6TA TTOAGCAGG GOAATTGAT 
57401 AAAGGAGAGT GGTTGCTGTT AATATTATCT TATCTATTGG GTQCTAT6TG 
57451 AAATATTGrA CATA6ACCTG ATGAGTTGTG GGACCA6ATG TCATCTCTGG 
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57501 TCAGAGTTTA CTTGCTATAT AGACTGTACT TATGTGTGAA GTTTGCAAGC 

57551 TTGCTTTAGG GCTGAGCCCT GGACTCCCAG CAGCAGCACA GHCAGCATT 

57601 GTGTGQCTGG TTGTTTCCTG GCT6TCCCCA GCAAGT6TAG 6AGTGGTGGG 
57651 CCTGAACTGG GCCATTGATC AGACTAAATA AATTAAGCAG TTAACATAAC 
57701 TGGCAATATG GAGAGTGAAA ACATGATTGG CTCAGGGACA TAAATGTAGA r 
57751 GGGTC7G(TA GCGACCnCT GGCCTAto CCCATAGCAG 
57801 AGAGmrCA TGCACCCAAG TCTAAAACCC TCAAGCAGAC ACCCATCTGG 
57851 TCTAGAGAAT ATGTACATCC CACCTGAGGC AGCCCCTTCC TTGCAGCAG6 ^ 
57901 TGTGACT6AC TATGACCTTT TCCTGGCCT6 GCTCTCACAT GCCAGCTGAG 
57951 TCATTCCTTA GGAGCCCTAC CCTTTCATCC TCTCTATAT6 AATACtTCCA ' 
58001 TAGCCTGGGT ATCCTGGCTT GCTTTCCTCA GTGCTGGGT6 CCACCTTTGC 
58051 AAtGGGAAGA AATGAATGCA AGTCACCCCA CCCCnGTGT TTCCmCAA 
5810rGTG(riTGAGA GGAGAAGACC AGTITCTTCT TGCTTCTGCA T6TGGGGGAT 
58151 GTC6TAGAA6 AGT6ACCATT GGGAAGGACA ATGCTATCTG GmGTGGGG ■'■ 
58201 CCnCGGCAC AATATAAATC TGTAAACCCA AAGGTGTTTT CTCCCAG6CA 
58251 CTCTCAAAGC TTGAAGAATC CAACTTAAGG ACAGAATATG GTTCCCGAAA 
58301 AAAACTGATG ATCTGGAGTA CGCATTGCTG GCAGAACCAC AGAGCAATGG 
58351 CTGQGCATGG GCAGAGGTCA TCTGGGTGTT CCTGAGGCTG ATAACCTGTG 
58401 GCTGAAATCC CTTGCTAAAA GTCCAGGAGA CACTCCTGTr GGTATCTTTT 
58451 CTTCTGfeAGT CATAGTAGTC ACCHGCAGG GAACTTCCTC AGCCCAGGGC ' 
58501 TGCTGCAGGC AGCCCAGTCA CCCnCCTCC TCTGCAGm TTCCCCCTTT 
58551 GGCTGCTGCA GCACCAGCCC CGTCACCCAC CACCCAACCC CTGCCGCACT 
58501 CCAGCCTTTA ACAAGGGCTG TCTA6ATATT CATTTTAACT ACCTCCACCT 
58651 TGGAAACAAT TGCTGAAGG6 GAGAGGATTT GCAATGACCA ACCACCHGT 
58701 TGGGACGCCT.GCACACCTGTCTTTCCTTGCTTCAACCTGAAA^^ 
58751 TGAT6ATAAT GTGGACACAG MiSCCGGQCA CGGTGQCTCT AGC^^ 
58801 CTCAGCACn TGGiGAGGCCT CAGCAGGTGG ATCACCTGAG ATCAAGAGH 
58851 T6AGAACAGC CTGACCAACA TGGTCAAACC CCGTCTGTAC TAAAAATACA 
58901 AAAATTAGCC AGGTGTGGTG GCACATACCT GTAATCCCAG CTACTCTGGA - 
58951 GQCTGAGGCA GGAGAATCGC HGAACCCAC AAGGCAGAGG nGCAGTGAG : 
59001 GCGAGATCAT GCCATTGCAC TCCAGCCTGT GCAACAA6AG CCAAACTCCA 
59051 TCTCAAAAAA AAAAA (SEQ ID N0:3) 



FEATURES: 
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Position 

941 GAGTMGTGGGTGGTCAGGTTACAGACTTMTTlTGGGTOAAAAffr^^ 

MGGTGTGGCTCTAAMTMTGAGATGTGCTGGim"GGGGCATGGCAGCTCATAM^ . 

ACCCTGAMGCTCTOCATGtMGAGTTCCAAAMtATTtdCAAAACI^^ 

TTCGATGTtTGTGTTCAnAAAATCTCTCAGtAAHC^^ 

CCCAACCTGGGATTGGm6AGT6ACTCTCTCAGACmCTGC(m^GGAGnT6TGAfiAG 
CA.T3 : 

GATGGCATACTCTGTGACCACTGTCACCCTAAAACCAAAAAGGCaCTCTTGACAAGGAG 
TCTGAGGATmAGACCCAGGAAGAATGAGTGATGGGCATATATATATCCTATTACTGAG 
GCATGAGAAGAGTGGAATGGGTGGGTTGAGGTGGTGrrrTAAGGCCTCnGCCAGCTTGT 
TTAACTCnCTCTGGGGAACGAGGGGGACAACTGTGTACAnGGCTGCTCCAGAATGATG 
n6AGCAATCnGAAGTGCCAGGAGCT6TGCmGTCTAnCATGGCC(XT6TGCCT^^ 

2612 TGAGnGGAACAGTTTGATACaWAACCATCCCCCCGCCCCCCAACCCCC^ 

CCGTGGAAAAATTGGCCCCTGGTGCCAAAAAQGn^GAGGACTGCTGAtCTAGAGGACCAA 
mAnCAATGTTGGnGAGTAAATGAGCTCTTGGAmGGTGAtGGAAAAATCTGAAA^ 
AACAGGGCTmGAGGAATAGGAAAAGGCAGTAACATGmAACCaGAGAGAAGrrrCT 
GGCTGTTGGCTGGGAATAGTCATAG6AAGGGCTGACACTGAAAAGAAGGAGATT6TGTTC 
[G.A] 

TTTCncnCTCAGAGCTATAAQCAAAGQCTGAAAGTTCTAGAAAAAGGCAAGTTnGTT 
TCAGTAGAAAAAAGGATAATCAGAACCATTTTTAGAAAATGGAATGAGACTACTTITGAG 
GCCATGAGmCTTGTCCCTGGAGAGATGAGCAGAGGTTGGACAAGTGCTTACCAGAGAT 
CnGTGGAGGCAGAAACTGTGCATCTAGCAGAGCATTGGCCTAACCCrnrCAAATGAGAT 

GCTGTTMCTCAGTCTTATTCTACAtGCTAGGMTCCTGTCCCT^ 

5080 ACAACGTAAAATAGTTGAMmGnGGTGGAAAGAAGAGCAGTCCACTCW^ 

ATGGGCATGCCTGGCCCCCAAGGTCTGAAGTGGTAGGGCTGTGCCTAtATCCTGAGAATG 
AGATAGACTAGQCAGGCACCTrGTGCTGTAGATTCCAGCTCCTGGACATAGCTCTTGTTG 
TAAAACATCCCWGCTTATACCAAGTAAnGAGTTGACCTlTAAACACTTGCCTCTTCC 
CTGGGAACCATATAGGGGATTGGCCTGGAGACGTCTGGCCTCTGGAAGAGTTGGAAAGCA 
[G.A] 

CCATCATTAmTCCmCCmCAGCTATAACTCAGAGCTCTCAAGTCTTTTCTGTGGA 
TCmTTGCCTTGGncnGCCCCTTTTACTttCAGGGAAGnGATTCTGTCnTTCTGT 
TCCAmAGTATGACAGGAGCAGAGAATGTCAGAGCTGTAAGGGACCTTATAGnAAAGC 
CTnOQCTGGTCCTnCATTTTATAGCTGGGACTAATAAGTAACGTCAAAACGCAATGAG 
TTCACAGAnGGGTCTCGCCnGGCATGTAACCCATATGnCATATTCTTGCTGTmCC 

6599 CTGTAATCCTAQCACTCTGGGAGGCCGAGGCAGAAGGATCGCnGAaCCATGAGCCCAG 

GAGTTTGAGACCAGCCTGGCCAACATGGCAAAAtn^CCACCTCTACAAAAAATACAAA^ - 

AnAGCCAGGCGTGATGGCAGACACCTGTAGTCCCAGCTACnGGGAAGCTGAGGAGCGA 

TGAnACCTGAGCCCAGGGAtATCAAQGCTGTAGTGAGCTGTGAtCATGCCACTGTAC^^ 

CATCCAGCTGGGGGACA6AGTGAAACCCCTGTCTCAAAACAAAACAAAT6AAAAAAAAAA 

[-.A.C] 

CCnAATAATCAGTAACTGTCACTTTATATTATGTTGTGAGTGTGTlGTCTATATACACCT 

ATATGTATACATTTCTCTTATTACACATTCATTGGTGATCTGATGTGGAGCCCCAGGGAT 

TAAGGGCAACmGAACTACCCTGACACAATCAAGCCAAATATCAnCCCGTGGAGGA^ 

TAGAGTATCTAGGTTCTGTCTCCTAGnGCAGCmACCTTGAGGACAGAGACTCTAATC 

CAGCTGTGCTGAAGGAGCACATCTCCTGACnCTGAGCTTTCCCCTGGTAAAnCAAACT 
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6983 CACATTCATTTGGTGATCTGATGTGGAGCCCCAGGGAnAAGGGCMCTTT^ 

GACACMTCMGCCAAATATCAnCCCGTGGAGGMGTAGAGTATCTAGGTTCTGTCTCC . 
. TAGTTGCAQCTn'ACGTTGAGGACAGAGACTCTAATCCAGCTGTGCTGAAGGAGC^ 
TCCTGACmTGAGeTTTCCCCTGGTAAAnCAMCTGGATGTCAC^ 
GAGCCTGGTAAmGCCCTGGGGAGAGTGACTCTCTmGGATCTAAmGACTTTTGCC 
[C.G] 

CAGnGGAGGAAAATCTTCAGGQCTAGGAAGGATTGTAnTGTCTGACCCCAGAGATAAC 
CTGQGTmGAGGAACATGGGGCATCAACCTGAATGCTCnGTAAGATCTCTCCCACGCC 
AGCnGCCy^GTGTTTCTCTGATGAATTTAGAGTACCTGAGTAGTGCAGGCCTGCTGGGAG 
GAGGACTCTCCCTCTGTGCTACTCAGAGAAAnCAncnCAAGGCCCCmCCAGCCTT 
GCTCma:CAGCTGGGCTACAGTOCAATAAAGGAAAT6ACTTncrrTCTaCC^^ 

9885 GGCGTGCCACCACACCTTGCCAI 1 1 i 1 1 1 1 lATTTTAAGTAGAAACAAGGTCTTATTAAt 
ACTATGTTGCCCAGGCTGGTCnGAACTCCAGCCAtCCTCCTGCCCCAGCCTCCCAAAGT 
GCTTGGGAmCGGAAGTAAGCCACTGTGCCTGGCCAGTGCAACCCCCATnTATACTAA 
AACAGGAAGGCC(yVGAAAGGlTTGGAGrAACTTGTCCAGGGTCACACAGATGATA7TTGA 
ACTCAGGTCTCCCTGGCTCCCAAGAGAGTCTGCTTTCCACTAGGACTCCCAQGAGAAAAA 

[A.-D 

AAAAAAAAA/UVCAGTAGACTtGGAGACAGAAAATCTGAmGAGTCTrAffTTGAGCTAGG 
CTAACTGTGTAACTGTGGGCAAGnTCCnAGCCCCTGTGAGCCTCAfimcmTCTGTA 
AAATGTCATAAAAGAAATCCATCTCATGGAGTAGTTGTGATGATCAAGGACTCTGAAAAC 
ATTAGAATGGtTTAATGTGAAGGAnAGCAGCAGCACATGGCAACATTCTGCATCTTATA 
TTAACTATCCAAATATATCAAGCGTCATFGCTATATATAAAAGTCATC^ 

12538 ACTTG6(WGGCTGAGGCAGGA6AATCACTTGAACCTGGGAGa^ 

CAGATCACGCCACTGCACTCCAGCCTGGTGACAGAGTAAGACTCCATCTCAAAAAAAAAA 
AAAAAAAAAAAAATTCCnAATTTGGCCTACAGTAGAGCCCTCCGTAATGTGGCCTCTCT 
CCACATCTCCACAACCTCCTGCTCCCTQCACnCAGCCTCACCTCTmCTGGACAGGCC 
CTCCnCTGACAAQGGCTnGTTCAmTGCTCCCTCTGCCTAGAATGCCCCCTTACTCT 

[G.T3 - 

TTCACrrAACTCCTGCmTCGmAGATCTTTACCTGGATG GCTCAG AGAAATATAGAA 

GTAAnCCT(y\a:CTGAAAAATAGGmGGTCCCTGTTmTGTmCATAGACCTTTCC 

mGAGGCTTTTTTTAAAAAAGTAGTmAATCTCACAmAnX^ATGTG ATCATC^ ^^ 

TAATGATATCTTAAGACCTCTAATAGAACAATTTGGTCATGGACTGTGGGGI 1 1 1 IGCCC 

CTCATrGTGTCAGCACTGAGCATAnGnGGCATAGGAGGGATATTTGTTGAATGAATTG 

17707 GTAGTGGGTGCTCAGAGTGmOCTGGGTGAATGATGTAmGTTGAACGACTCm 
CA(rrTGAATAAAGTCCATCCAGTATGCACCAnACCATCrCTTC€K:TCTACM^ 
mOQCAAGAGOTATinmGAGGTGATAAGATMGCTCAAACTTAT^^ :^ 
CTCAGTCfGTAAATiSTCATCCCTAAGrCTrAAACCATCAAAACCAG GK 
GCATQCCTTCrGCAACTGTAGCMCCTGCTGTOCTtATTTTG^ 1 1 1 It 

[T.C] 

CC(WV\AQCTAGAGTCCCrrCTCCCATGGGCAGTGCTGGAAGTGTGCTAACAAATrCTTT 
CTCCATACTG(nTACGATtACAAAAAAAACCCTCAGCATCTCATGCCAGACn6A 
GGTTGTmCTmGTGTGTCAGCTGTAnCTGGTCATGACTTCCTGATGATGCCCTATA 
GAGATTTTCCTGAGATCAGA6GGTGCTC(mGCCATCAGTAGCACTGACTCTTG CAGAA 
GCAaGTTTCTGAAGTTGGCTAATGTCATCCCTOVCGm^GTnGTTTGAAAi I IGi 1 1 1 
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18219 TGCCATCAGTAGCACTGACTCTTGCAGAAGCACCGTTTCTGAAGTTGGCTAATGTCATCC 
CTCACGTTTGmGmGAAATTTGTmAGTTCMGAGATAGCACTTTCATGGAATGAC 
GCTATCTTCTAGAATCAC I J 1 1 1 1 1 1 1 1 1 1 1 1 IGAGTTGGAGTCTCGCTGT6TCGCCAGG 
CTGGAGTGCAGTGaACAATCTCAGCTCACTGCAATCTCCACCTTCCG^ : 
TCCCCTGCCTCAaCTCCCGAGGAGGTGmCTACAGGCGCACACC^^ 
[•,A]. . 

TTfrATGTGTTTOGTAGAGACGQGGTTTCACCGTGTTGGCCAGGATGGTCTC^^ 
TGACTTTGTGATCTGCCTGCnCAGCCTCCCAAAGTGCTGGGAnACAGGTGTGAGTCAC 
CGCGCCTGGCCTAGAATCACCTTmATAjXATAACGTGAQ[:ACCACTGCCGCGT^ 
AGGAAAGAGAGAGGCAGCTACTGTGGGGTTACAAATGGGTAAGAGTGGCACCAGGAAGGT 
GAAAGTCTCTACmGCCAAGGCnAACAAAATGrCAATCACCAAACATrrATTTAnAA 

GACCCCCATGATGAGCAACTATAGCACTAGAACAGTGATAATAACTAATGTTTATAAT^ 
ATCTTCAGTnACAGAGGGCTmCTACTCATCATCTAGTrTAGnCCTGCAACAACCTC 
nGAGGAATATAGCACAAGCAGGACAAGGGAAGCCCAGAGATGTTAAATAAmATCCAA 
GTmTGCTGCTGGGMGGGCAGCACTGAAATTAAAAGAAAAGTTTTCTGAGCTCAAATC 
CCATGCCCmCCTCAATGTGAGCTCTAGCAAQGTAnCAGGAATCCTGCCTCTACAGT^ 
[C.T] 

AGAGCCTCAAATTGCTGGGTATGTTGAGncnGTATCTGATnTTCTAGATTTCaGCC 
CACATTCnACTGTCTGGATATCAGGAAAGAGnTATCAAATGCCTGTGGAAATCCAAGA 
TAAGGTCTCATGATGAGTAACCCAGTGAAAACATGAAGTCAAGTCTAACTAGTCACTACT 
ATnCACTACTGCTGACTCCTGATGATCAGCTCCTmCTAACT^GCnACTGTCCAC™ 
nCaVTCATCTGCCTAGAATITATGTGAAGGAATCAAAG(7^ 

GGACCCTtGnmG^AGGATGACTGCTGCTATAATGTAGAAAGTGAmGG^ . 
AGGAGTfiGGGCACGAAAGATGGTTAGTAGATGGGGGTGGTMTGCmCCTTTCAGTAtr 
TGGAGGCnCGGAGTCCTCAAAAAnCTCTTCCTTGAnGGAGTCCTCCCAGCCAATAGA 
6GGCnCACACAAACAGmCTTG66TmGAATTGTTTGACCAGAGCTTT(rrrCCGA(^ 
AAAGGTTGGGGTGATTCAnCACnACCACACCnGCCTGAACATTCACTTGGGGCTGCC 
CG.T] 

GnATGAAGGCTATTGrrCTCCAGCCTGTCACAGACGCTTrGAAGACCTGTGCCTCAGCT 
GGnCTAAGGAGTCAGmGmAGCTCCGTGCCAGGTTTCCAACtrATGAAATGTGCTG 
GAGATrAACACCTCTCCTGCCATmATCCCTACTATAATTaCAGT(^ 
CAGTTGCCTCTGGCAGCCATAACTGATGAATGTTCTGCCAGCTGCTCTGAGGACCTAGAA 
GAGCAGTmCTATCCAGGACCAGTTTCCAAGGGTGGGAGGGTGAAATATATCCTCCAGT 

24566 CTACTCTGGAGGCT6AGGf6AGAGGATCACnGAGTCCAGAAGGTC6AGGT(y\AGATTST 
AGTGAGCCAT6ATGGCATCACCGCACTCCAGCCTGAGTGACAGAGAGAGACCCT6ACTCA 
AAAAAAAAAAAACAAAAAAAAAAAACACCCTCACGACTTATCAGCTATTTGTCTTGAGAA*^^ 
TAGTGACATAACCCCTCAGAACCrAmCGTAATCTGTTAAATGAGGCTGAT^^ 
CTCCTmACTGGCAATTTAAACATGATGGATAATAAATGCTAAQCAC™ " 
EC.-] 

TAGAAGATATTAACTGCTCMTAAATGGTAGCTTCTTAACAGTAnCAAACCCATGlTGCT 
CmTCACATGCAnGTTGTCCCtGTCTCCAGTrGGTGGAATGGGAAAAGGCTCCCnST 
AACCCCATCTACCATCTmTtmCTnCCTGCCATGSTTCACAGTAAGAGATAGAAGC 
. TGCA(m6ACTTCTGGCTCTTTACAATGGTGAGCG6TGTGTGCCTGGTAAGGGAGAGCT 
GATGTCACTGCCCCAAATCCAGTAGTGAGATCTGAGTGTrCTGGTnCCTCWGC^ 
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26604 GATTTGCAGCTGAGCCTGTCTATCTGGTGTGGGAAGAAGATGGGGAGnTACTTGTCAGTC 
CCGGCmCTTCACCTCCAGAGACCTGTTTCGGTGAGTTGGTCTCCGAGnCCCCTCTCC 
ATCTCTCCTGGCCCCTGGTCCTGAGAGGAGGGTGGTCTCCGTAAATCTCCTTCTCACnA 
GTCCTTTACCATCGGTTCTGCCGGGCAGAAGCCAGCGGAQGTTATACCGAAGGAGAATeG ' 
GCCTtGTGAGGTACCCCCAmtGTCCTGGAAGtGGTGAGGGGAGGGAtATAX^^ 
[G.A] 

AACTTCTTAGGGAGCTCCAGCTCCCCTTCTATCCCAGACAAACCTGAAGGAGCCTCCAAA 
AGATGCCACTGACCTGCCCATTGTAGATGmCTGCTTCCGGGGGGAATAGCCCAAATAG 
AGTGCTGTrTCCAGCTCTCACATGTCmCCTGCGGGCCATGCTGCCTGCCCAGGAATTT 
GTCXCAAeAAGCAGGATGGGCAGGTTTTGCCAAACTGTGGAAACTGGCAACTCCTGGGTG 
TGQGTAGCCT6GTA(7\CAGTAGGCACCTOtAAACGmGTTCTC7TAAfGGCAGGC^ 

27255 TGGGGAAAGACCTGGGCGAGTGCTTCTA/teACTGGAGCAATGGGCmAGAGTGTTCCTC 
AGCTGCTGGGCCAGCCCCCACACCTCCTCACTCCCTAGGCCTAAGTACCrrrc^^ 
. CTCTCTGTGQGGCTTCTCAGAGQGAGAT6TG6AAAGTCTACCT(nMCCT 
GCTCATTGCCCCACTCCACCTCCCATAGAAACTCCCGAGGGGGirrCTGOtoCTG^ 
CCCTTGT6AATGGAGCCATTCCAGGCTAGGGTGQGGI I IGN I ICATTCTrTGGGAGCAG 

CCG] . ::. 

CTGTTGTTGCAAAAAGGCTGCGTGGGGGTCAaAGTGGTGGTGGTGGACTrmCGGnGT 
GGCTrCTCTAAGCTAGGTCCAGTGCCCAGATCTTGGTGGCQGGATACTAGnCAGGTGGCG 
AGGGGGTGGGGAGAAAAGCAGTGTAGCATGTGGimGTGGAATCAGOGGA^ 
A7TGCTG6GAA6TGTCTGGACAGGGGGAAGGGGGAAGGGAACTGGTCCTCAATGCTGACT 
. GTAGGAAGGGGGCTGCTAGAGACTTrATCGmAATGTCTGAAGAGGGTAAAGAGATTAT 

27399 AGATGTGGAAACTCtACCTCTAACCTGGCTTTCmGCTGATTQCCCCACT 

ATAGAAACTCCCCAGGGGGmCTGGCCCTCTGGCTCCCnCTGAATGGAGCCAnCC^ 

GCTA66GT6GG GI I IGI H I GATTCTTTGGGAGCAGCaGTTGTTCCAAAAAGGCTGCCT 
CCCCCTCACCAGrGGTCCTGGTCGACTTITCCCTTCTGGCnCTCTAAGCTAGCTCCACT 
GCCCAGATCTrGCTGCCGGGATACTAGTGAGGTGQCCAGGCCCTGGQCAGAAAAGCAGTG 

U.G] 

ACCATGTGGtTTTGTGGAATGAGCGGACGCTGGTAGATTGGTGGGAAGnrGTGTGGACAGG 
GGGAAGGGGGAAGGGAAGTGGTGCTGAATGCTGACTCTAGGAAGGGCGCTGGTAGACACT 
mTGCmAATCTGTGAACAGGCTAAAGAGATTATATATGCGC ATnTA CAGATGAGGG 
AACGAGmGAAGAGAGTTAAGATATGGAGCGTCAGTGGGGAGGTtTTTGTGTGnCGTG 
AGmGTCTGATCCmAGG^GGCTGCAGGmGTTTTGTTGTGGTAGTGGAGAGGAAAT 

28088 AAGAGCCAATGGAAAnGATCnGAGTTOGGAGAAAGCrnTACATGTGGAAnAAGAT 
GGGAAGTGTTCAAGTAGCGAGAmGAQGTGCTGATTAATTrGTCTTAATCGTGGGAAGG 
CAGGmGGAGAAGGGTTGTrGCmAGGAGCGAGGAACTATACGCGTTmCGGTTGGA V 
GAGGGAGGGMGCGAGGGAGGAGACAAGmTCAGGAAGAGGAGAAGGTAGAGGAGATAG : 
TGAACTCTCAACGTGAACCTnAAGGGCCAGAGGAGTAATGCCACGCAAGTGCAGGTGCG 
[G.A] , 

mGTGTTGnCTGTGGGAGGOrrTGTGGAGAAGCTGATGTTGTTGCGCGTAGGGGGAAG 
GTGGGmGCGGAGCTAGAGTGTGGGGGGTACTGACTGAGTTrCGTAGACATTCTTGGGT 
TGGGCAAATAAGAGGCGAGAnCCTGAAGTGACnGTGAAGAGATAGGTGCGACACAGGG 
GTGTTTGCGGGGAGGGAGQGAGGAGGCAGAGGCTGTGGTGTGGCAQGTATCGGnACGAC 
ATCACTACCTGGTCAGAAAQCTGmCTGCCAnAGCCCaCGCTCTTlTATrATAGGAT 
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28734 MGTAGAAGCTAGACTTCTTGGGCTCCT6AACAGGGTCCTTGCTGGATTCTGTGAAACAA 
AnMGTTClTGACCCTAGGCCTCTGGGQGAGTACAAAGTCTATGGGAGnCTGGGGCTG 
TGGlTG(W\GGAAAGTGACGCAACCAGAnCCATGGGGACAT6ATCAGGC(jrGACATGTC 
: AGGGAGGAAGAGGGAGCAAGGGAATGAAGAATACAACnCTGTGTGCC^^^ 
CT6ACAQGC(7VTACATACTCAGCAGAGAATG(y^Cn^GTCTTTC(TAC(7^ 
CG.A] \ - 

AGTeAGCTGCAAmCCACTGIGCTTCCAACTAA(yW\ATACCT(W\ATTG6AAmACA 

AAAGAGGrAAATTAGGGAGTGGCTTTTGTCGGACATCmAAAQCATimCTT^ 

GAATTTCACTTAATGTCCAATACTGAmAATGAGCTTTGGGmAC^ 

AGAAAACAAATGAACCmGTGTTCCAAAGCAATCCATGmAAAGGGAAAAAATO 

ATAACTCTGCCCAGCTTCACAGTAACCmGGCAGGTGCCTTAGGTCCTCTGQGACTCTT 

29246 AATCCATGTTTAAAGGGAAAAAATOTGCATAACTCTGCCCAGCnCACAGTAACCTTTG 
GCAGGTGCCmGGTCCtCTGGGACTCTmCCnATCTGAAAAAtGAAGGAOtGfi^^ 
AQGTGAATGGnCCCAGCTCTGCMcnATGTGGGTCCTCAGAGGCACACAAGCTCTm 
CCATTATTtGCCAAATAATGGAGGCCCTGTCnnrAACTGCASTAC^^ 
nGAAACTACAGTCrTCCTGGTTmGCTTGGMCTGAATaCT 
C-.T] ■ • • 

ATncnGCTGTTCGTAGGCTTCAmTGTGmGGTTAATTTmAAAAOUCAATAAC 
ATATTCCATAATAATOCAGCnAATTGGCAWCTGTTTMCT^^ 
AGGAGGAGTAATAAAGGGATTmGACTGAGCTCTTATGGAACAGAGTCTCT^^ 
CTGTCATATCTGCCCTTCTGGGCCCT6GGGAAAAGTTGGCATCCCCAG7TGTGGTGCTCT 
CCAGGTQCCCTCAGGCTGTGGTGGAGGGAGCTTCCCAmTCTCm 

29490 AACTACAGTCnCCTGGTltrTGGnGGAACTGAATCASTG(^^ 

TCnGCTTGTTCGTAGGCrrCAnATGTGtTTGGTTAATTTTtTAAAACM^^ 

mCATAATAATOCAGCTTAATTGGCAGACTGTTTCASTCTATAGGATCTGCAGGAAGG 

AGGAGTAATAAAGGGATTmGACTGAGCTCnATQGAACAGACTCTaCTAGGCCCC^^ 

TtV^TATCTGCCCmTGGQCCCTGGGGAAAAGnGGCATGCCCAGTrGTGGTGCTCTCCA 

CG.A] 

GTGCCCTCAGGCTGTGGTQGAGGGAGCmCCAnCTCTCCTTCAGCCCACTCAAnCAG 
AGGCTAGGGGCTGAAAGAAGCnCTCTACAACTGGCTGrrTCACTGQGAGGTTAAGGGATG 
ACCATCCAGCCAGGCCTrCCTCAGGACATGGGAGGQCnATGCTTTAACATGTGTAAATC 
CACTGCAATAATGACTGGTTCTmACCCCATAAGGTTGAGAAmACCTGTAAACATTT 
TTGTCTGAAGAATTTGGATGTAAGTGAGGGCTGGGCCTCTATCnATCTCACTTGGCTTC 

29934 GGACATGGGAGGGCTOTGCmAACATGTGTAAATCCACTGCAATAATGACTGGTTCTT 
mCCCaTAAGGTTGAGAAmACCTGTAAACATTmGrCTCAA^ 
GTGAQGGCTGGGCCTCTAtCmTCTCACnGQCTTCTCTCAGC/yCAGCACC^ 
TTGnCTtACACATCCTAGATGCACAGTMGTATTtCCTAAmTrA^^ 
ATCAATrGATnCAGCTGGGCnGGTGGCTCCnCCTGTAATCCCAGCACTTTGGGAGGC 
[T.C] 

AAGGCTGGAGGATCACCTGAGTCCAGGAGnTAAGACCAGCCTGGGCAACATAGGGAGAC 
CCTGTCTCTACAAAAAATAAAAAAnAQCCAGGCATGGTGGllGTGCACCtGTAGTCCCAG 
CTACTCAGGAGGCTGAGGCAGGAGGATCTCTTGAGCCTGGGAGGTCAGACTACAGTGAGC 
MTGATTGTGCCACTGCACTCCAGCCTGGGTGACAGAGTAAGACTCTGTCimAAAAAA 
AAAAAAAAAAAAGnGATnCTATTTGGATAGATAAATAAnCATmAGGACC^ 
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34480 CTGACTTCAAGT6ATCCACCCGCCTCGGCCTCCCAAAGTGCTGGGATTATAAGCATAAGC 
CACTGTGCCCAGCTGCTCTCTATATTTTTAATACATAmTTTCCATTAAT^ 

AGTTCATTTTATA6ATGAGGAAACTAGGCCAGAGAA6TAAAATATCTTGCCCAAGATGAT 
: GTMCTAGTAAGTGGGAGGATCAAGAmAAACCAAGCAATi^^ \ 

MGAATGTGGCCACTGTQGAAGGTGCAAGGCCtrGACAACAAGMTAGGGAAAA^ 
[A.G] 

CTAGAAGGAAAGAGATGGCATGGGCTCAGCAGGCCAGGGAGCTCTTAGCTGTGTGTGTTG 
GGAAGCTCAGAAGGGAGGAAGAQGTTGTCTGTGCAGGTAAGTCCTGAGAACACACCAGAC 
TmGAGAGGTGGAGCTTCATAGCCAGGT CAmGGGGAGAAGGGAGCTATAGATTTm 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 IAGAGACGG6GTCTTACTATGTTGCCCAGGCTG 
GTCTT6AACTCCTGQGCTCAAGTGATCCTCCCACCTCAGCCTCCCAAAGTGCTGGGATTA 

38812 AMTCCAGCAGATCCATTGAGAGTnAAGCAGCAAGGTGnGTGACCAAGTTAAWm 
AGAAGGATCACTGGTATGGAGGTTGGAnGGAGAGGGSAAAGCCTAAAGGTATAGAGACT 
AGnAGGAAGCTATTGTAGGCTGGGCATGGTGGTTCATGCCTGTAATCTCAGCACnTGG 
GAGGCTGAGGTGGGA GGATTGC nGAGGCCAGGAGnGAAGACCAACCTGGCCAACATAG 
CAAGACCCCGTCTCTGTTmCTTAAnAAAAGAAAAGTCCAGACGTAGACATAGTGGCT 
[T.C] 

ACGCCTGrAATGCO^GCACTnGffiAGGCCAAGGTGGGCAGAnGCnGAGGTCAAGAGT 

TTGGGAnAQGCCAGGCGCAGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAG 

GTGGGCGGATCACAAGGTCAGGAGATCAAGACGATCCTGGCTAACACAATGAAACa^ 

CTCTACTAAAAGTACAAAAAnAGCCGGGCATSGTGGCGGACGCCTGTAGTCCCAGCTAC 

TCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCTAGGAGGCGGAGCTTGCTGTG^^ 

40731 GTTCTGTCCTATGTCTGTCTCTCGGATGAAGCTlGAGCTGGCTTTCAGAAa^^ 

TAGGAAAGGAACCAGCTGGCCAGGGACAGACtATGAGGATTGTGCTGACCCAGCTGCCCC 
TGTGGGGATCACAGTTTACAGCCAGAGCCTGTGCGGACCCAGCTGTCTGCCAGGnrCCT 
TAGAAACCTGAGAGTCAGTCTCTGTCCACTGAACTCCTAAGCTGGACAGGAGGCAGTGAT 

GCTAAACCCTGAAGGGCAACATGGCCTATGGAGAAA6CATGGAQCTCAGA6CCTGGAGTA 
[C.G] 

GGGCACAGA TAGGATTGA ATAAATTGTGTAGAAAGACTTTGAAAACAATAAAGCAAAAGA 

TGAATGAACGTTTTTmAGACnGAGGGACCAACAACCCCCAAACCCCAGATTC^^^ 

GGTCCATGGGGAAGGAGAAGTTGCCTTGAGTGGAAGCCCCAAGTAGGGAGACTTACAGAA 

AAGAAGTCAAGAGCACTGGCTCCCAGGCAGAAATACTGATACCCTACTGGGGCTTCAGGC 

TGAGCTCCTCCCnCACAAAT(yiCTTCATCTCTCT6AGCCTGTTTCTGCATCTGTGACAT 

41303 CTCTGAGCCTGTnCTGCATCTGTGACATAAGATGGTAAGATAAAGGTGGCTGTCTCACC 
AATTATGTAAGGAnAAATGTGGAAAAGGACATAAAGTTGTATAGTGCTGCCATAGGGAC 
AGTGTTCAGTAAACGTGACACAmnAGTATCACTAAGAATCAGGTTCnGGCCAGG^ 
CCGTGGCTCATGCCTGTAATC(mCACTCTGQGAGGCCTAGCTC€6AGGATGGCn 
CACAGGAGmGAGACCAQCCTGAGCAACATAGTlGAGACACrGTCTCTACAAAA^^ 
[T.A] 

AATAATAATAAnGTTTTTAATTAGATGGGCAGGGCACTGTGGCTCACACCTGTAATCCC 
AQCACTTreGGAGGCCAAGQCCGGAGGATTGCTrGAGGCCAGGAGTTCAGGAGCAGCCTG 
GGCCACATTCCTGTCTCTACAAAGAATAAAAAAGTTAACTGGGCATGGtGQCACATGCCT 
GTAATCCCAGCTACTCAAGAGQCTiGAGGAGGAGGAnGCCrGAGCCCAGGAGTTCAAGAC 
TGCAGTGAGCCnGATCACACCACTGTACTACAQCTTGGGCAACAGAGTGAGACCTTGTC 
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41305 CTGAGCCTGTrrCTGCATCTGTGACATAAGATGGTAAGATAAAGGTGGCTGTCTCACCAA 
TTATGTAAGGAnAAATGTGGAAAAGGACATAAAGnGTATAGTGCTGCCATAGGGACAG 
TGlTCAGTAAACGTGACACAnClTAGTATCACTAAGAATCAGGTTCTTGQCCAGGC^ 
GTGGCrrCATGCCtGTAATCCCAACACTCTGGGAGGCCTAGGTCGGAGG^ 
GAfi6AGim6AGAC(mCTGAGCAACATAGT6AW^ 
[-.A] 

TAATAATAATTGTrrrTAATrAGATGGGCAGGGCACTGTGGCTCACACCTGTAATCCCAG 
CACTTTGGGAGQCCAAQGCCGGAGGAnGCnGAGGCCAGGAGTTCAQGAGCAGCCTGGG 
CCACATTCCTGTCTCTACAAAGAATAAAAAAGTTAACTGGGCATGGTGGCACATGCCTGT 

aatcccagctactcaagaggctgaggaggaggangcctgagcccaggagncaagactg 
cagtgagccttgatcacaccactgtactacagcngggcaacagagtgagacc^ 

41457 ctaagaatcaggttcttqgccaggcaccgtggctcatgcctgtaatc(x:aacactctiggg 
aggcctaqgtcqgaggatggcttgaacacaggagmgagaccyvgcctgagcaacatag^ 
gagacactgtctcta(ywwwwv\taataataataattgttmaattag^^ 
ggcagtgtggctcacacctgtaatcccagcactttgggaggccaaggccg^ 
tgaggccaggagttcaggagcagcctgggccacamctgtctctacaaagaataaaaaa 

[G.G] 

TTAACTGGGCATGGTGGCACATGCCTGTAATCCCAGCTACTCAAGAGGCTGAGGAGGAGG 
AnGCCTGAGCCCAGGAGnCAAGACTGCAGTGAGCCTTGATCACACCACTGTACTACAG 
CTTGGGCAACAGAGTGAGACCnGTCTCCAAAAAAAAAA GI i IGH 1 1 Ml I lA TCCACT 
CTCCTCACCAAACAAACTGAGTAAGnAGAGCCCTCTCAGCTGGCATGTGTTGGAAACAG 
TGCCCTCTCATTAAAGTGCTGCCCTGACTCCCAnGCCTCTTGGCCTrGGTCAGTA 

43168 AGCTACTTGGGAGGCTGAGGGAGGAGAATCGCnGAACCTGGAAGGCQGAGGTCGG^ 

ASCCGAGATCGTCCCATTiGCACTrCAGCCTGGGCGACAGAGCGAGACTCTGTCTCAAAAA 
TAATAATAATAACAATAACTAQCCGGGCCTGGTGGCACATGCCTGTACTCCCAGTTACTC 

AGGAGGCGGAGGCATGAGACTCAGCTGMCTAGGGAGACAGAGGITGCAGTGAGCCA^ 
TCACACCACTGCACTCCAGCCTGGnGACAGAGCGAGACTCTGTCTCAAAAAAAAAAAAA 
[A.-.T] 

CCCATTTCCTCATTTrnGGATACTAGTATAACTATCACTCTAAACCAGnAGTAC™ 

ATCAAGCAGATATGGGAGATGGTGAAnACCATCTACAGTGTTGTCATATATGTCACATA 

CTGAGCATTATCAGCTAGTAGAATCTAGnMnGTrCTATGTGTGATGTATGCAGAGTT 

CCCATTTTGAATGTGTTTTTACTATGCnAAATAAATGAaGATGTCAQt^ 

TGATACATCTGATGTAAGAGCCCCTGTTCCCCAATMTAACATCTAAACTATAGACAn^ 

43357 AGGCATGA6ACTCAQGTGAACTAGGGAGACAGAG6TTGCAGTGAGCCAAGATCA(y\C^^ 
TGCACTCCAGCCTGGTTGACAGAGCGAGACTCmCTCAAAAAAAAAAAAATCCCATnG 
CTCATTTmGGATACTAGTATAACTATCACTCTAAACCAGmGTACn 
GATATGGGAGATGGTGAAmCCATCTACAGTGnGTCATATATGTCACATACm 
TATCAGCTAGTAGAATCTAGrrAATTGnCTATGTGTGATGTATGCAGAGTTCCCATTTT 
CT.G] 

AATGTGTTmACTATGCnAAATAAATGACTGATGTCAGCAACCCCAAAATG^^^ 

TGAT6TAAGAGCCCCT6nCCC(W^TAATAACATCTAAACTATAGACATTGGAAT6A^ 

GGTGCCCCTAAGmCCTC(XTCWGGGmCTTB3CCGGTCTCT6AG6ACTACACA^ 

CTACTCCCGTCmCCTCATCTTCAGGCGCAGTAACAGTATCTCCAAGTCCCCTCG^ 

AGCTCCCCAAAQGAGCCCCTGCTGTTCAGCCGTGAMTCAGCCGCTCAGAATCCCr^ 
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45664 CCAGCTTTCCnGGCTTCCCCCACCCCCAGGTGAAAGTGATGCQCAGCCTGGACCACCCC 
AATGTGCTCAAGTTCATTGGTGTrGCTGTACAAGGATAAGAAGCTGAACCTGCTGACAGAG 
tACATTqweeQSGGCACACTGAAGGACmCTKGCAGnrATGGTGAGCACACCACCCCAT 

AffraCCAGGA^CCnGGtGGGTtGTCAGACACCTATQCTATCACTACCCTAOlAGCTTA 
MGGQCAGAGGGGCCCTQCT7TGCCTCCAAA(3GACCATGCTGGGTOGGA(^ 

[T C] 

AGGGAQGCTTCACTGGGAGACCACATTGACCCATGGGGCCTGGACCACGAGTGQGACAGG 

GCTCAACAGCCTCrGAAAATCAmCCCAnCTCCAG6ATC(mCCCCTGGCAQCAGAA 

GGT(y\imTGCCAAAGGAATCGCCTCCGGAATGGTGAGTCCCACCAACAAAa^ 

CAGGQCGAGAGTAGGGAGAGGTGTGAGAATTGTGGGCnCACTGGAAGGTAGAGACCCCT 

TCCTATGDVACTTGTGTGGGCTGGGTCAGCAGCTAmAnGAGTrTGTCTCTGTCAa^ 

47549 AAnAGGTGGGCGTGGTGGTGCACGCCTGTAGTCCCAGCTACTCAGGAGGCCGAGGCAGG 
AGAATAGCTTGAACCTGGGAGQCAGAASTTGCAGTGAGCCAAGATCACACCACTGCATTC 
CAGCCrGQGrrGACAGAGTGAGACTrCATCTCAAAAAAAAAAAAAAAGAGAGACTGATATG 
GTTAGTACATTGGGCTGGAATGCGGAGQGTCCAGGGAATGGAGCCCTQCATAGGGGGCTA 
AfGAAACAmCAGATTTCTGAATTAAGGTAGTGGCTGTGQGGACAGGAGCCTGGGAGGC 

[AC] 

GG6TGGAGTCAGAATGGAGAGACTGGTTGGCAAT6AGGGAACAQ6AGGAGGAGGAGGAGG 
AGTTACGAGTGQCTTGAGGTGTCACnACCAGACATrTGGGGGATGGGGGATAGCCGTGA 
nGTTGAGCAACTGGTTTGGGAAGAGCTAGCATTGATCCCTGCTGnCTGTGCTAGCAGA 
ACCTATCAGCATCTrCTGGGCAGGAAACTGQCTCCATGAGACTGGCTTAGGGAGAGGCTG 
CTAGTCACCTAATCTGCAGAGAAG6GGCAGCTGGAGCTGTGQGA(yVGAAGAGGCATCCAT^ 

47908 GGAGmCGAGTGGCnGAGGTGTCACmCCAGACAmGGGQGATGGGGfiAT^ 

GAnGnGAGCAACTGGmGGGAAGAGCTAGCAnGATCCCTGCTGnCTGTGCTAGCA 
GAACCTATCAGCATCnCTGGGCAGGAAACTGGCTCCATGAGACTGGCTTAGGGAGAGGC 
TGCTACTCACCTAATCTGCAGAGAAGGGGCAGCTGGAGCTGTGGGACAGAAGAGGCATCC 
ATGTAQCTGGTGGGGGTGTaCAGCTTGTGAAGAGGAGATGGCinTGAGCAGGGCTGACA 

[C A] 

TGAAAAGGCTGGAA6AAAAAAACAGACACACAAGAGTCTCAGGATCAG6TAGCATAGGAA 
AGTrGTGGACAGTCTTrGAGGAGCACTCCCTCAGGCAGGCAGGCAGGCAGGTCATGAGCT 
ATAGCGATTCAGGAAGAGCTCCCTGGGTGTGTGAGCAQCTCCAGGAGCCTAAGGGATGAA 
AGTAGTATTGCAGGGGGCTGGAGAGCAAGGAGTGGCTCCnCTACATTTGCAAGQGAAGG 
AGAAAGGAAGTTGCTCCTGAGAGTGGTAAGAGTCAGTGGTGGAGQCCTGGAGAGGAGACA 

522157 TTGTGAGGGGTAGAGGAGAGGAGAGACAAGQGATQGTTAGGATAATGAAGGAATGTnTG 
t H i im IN IG TTTTTSAGATGGAGmCACTCTGrCACCCAGSCTGGAGTGCAGAGGT 
GCAATCTT66CTCACTGCAGCCT(XGCCtCCCAGglTCAAGC^^ 
TCCCyVVGTAGCtTGGGACTACAQGTGTGCGCCACCAeGCCTGQC^ 
GTAGAGACAGGGmCGCCATAnGGCCAGGCTGGTCTCAAATGCCTCACCT^ 

[C A] 

CACCCGCTTCAGCCTCCCAAAGTGCTGAGATTACA6GCATGAGCTACCGTQCCTG6CCAT 
GAAGGAAGAmGTmAAAAAATTGTTrrCTTTAATATrAAnGAACACCTCTGnCAG 
AGCACTGGGCTGGTGCCAGAGGGTTTCA6ACATGAATCAGATCCAGCACCTCATAGAGCC 
TTAATCTQGCACACACACACAGCCACAAGGAGACACAGACAAGQCAGGGTAGGATGAGre 
GAAGCTAGGAGCAGATGCTGAmGGAACACnGGCnCTGCAGTGAAGCCCCTTCTrAG 
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54654 GQCCCCGQ(XCCGGCCCCCAGGCCAGGCACTGGCGGCCMQGACCACGCATCTACTTTCA 
GAGCCCCCCCCGGGGCCGCAGGAGAGGGCCCGGGCTGQGCGGATGATGAGGGCCCACT6A 
- GGX:GC(y\AGGGAAGGTCACCAT(yyAGTATGACCCCAAG6AQCTACGGAAGCAa^^ 
tAGAGGACTGGATCCTGGAGCAGCT(y^CGCaCTCTACGACTGCCAG6AAGAG^ 
CAGAACTAGAGATTGACGTGGATGAGCtCCTCGACATGGAGAGTGACGATK^^ 
[T.C] 

CAGGGTCAAGGAGCTGCTGGTTGACTGnACAAAGCCACAGAGQCCnCATCTCTGGCCT 

GCTGGACAAGATCCQ6GCCAT6CAGAAGCTGmCACCCCAGAA6AAGTGAG^ 

GACCCAGGC6AACGGTG6CTCCCATAGGACAATCGCTACCCCCC6ACCTCGTAGCAACAG 

CAATACCGGGGGACCCTGCGGCCAGGCCTGGTTCCATGAGCAGG GCTCCtCG TGCCCCTG 

GCCCAGGQGTCTCTTCCCCTGCCCCCTCAGTTTTCCACTmGGATTnrnATrGW^ 

54679 GGCAGTGGCGGCCAAGGACCACGCATCrACmCAGAGCCCCCCCCGQGGCCGCAGGAGA 
GGGCCCGGGCTGGGCGGATGATGAGGGCCCAGTGAGGCGCCAAGGGAAGGTCACC^^ 
GTATGACCCCAAGGAGCTACGGAAGCACCTCAACCTAGAGGAGTiGGATCCTGGAGCAGCT 
CACGCGCCTCTACGACTGCCAGGAAGAQGAGATCTCAGAACTAGAGAnGACGTGGATGA 
GCTCCTCGACATGGAGAGTGACGATCCCTGGGCnCCAGGGTCAAGGAQCTGCTGOT 
[C.G] 

TGTTACAAACCCACAGAGGCCTTCATCTCTGGCCTGCTGGACAAGATCCGGQCCATGCAG 

MGCTGAGCACAeCCCAGAAGAAGTGAGGCTCCCCGACCCAGGCGAACGCT 

AGGACAATCGCTACCCCCC6ACCTCGTAGCAACAGCAATACCGGGGGACCCTGCGGCCAG 

GCCTGGnCCATGAGCAGGGCTCCTCGTGCCCCTGdCCCAGGGGTCTCtTCCCCTGCCCC 

CTCAGTrrrCCACTmGGATTTTmATTGnATTAAACTWGGGAC^ 

54693 AGGACCACGCATCTACTTTCAGAGCCCCCCGCGGGGCCGCAGGAGAGiaGCCC^ 

CGGATGATGAGGGCCCAGTGAGGCGCCAAGGGAAGGTCACCATCAAGTATGACCCCAAGG 
AGCTACGGAAGeACCTCAACCTAGAGGAGTGGATCCTGGAGCAGCTCACGCGCCTCTACG 
ACTGCCA6GAA6AGGAGATCTCAGAACTAGAGATTGACGTGGATGAGCTCCTGGACATGG 
AGAGTGACGATGCCTGGGCnCCAGGGTCAAQGAGCTGCTGGnGACTGTTACAAACCCA 

[A.C] 

AGAGGCCnCATCTCTGGCCTGCTGGACAAGATCCGGGCCATGCAGAAQCTGAGCACACC 

CCAGAAGAAGTGAGQGTCCCCGACCCAGGCGMCGGTQGCTCCCATAGGA(y\ATCGCT^^ 

CCCCCGACCTCGTAGCAACAGCAAtACCGGGQGACCCTGCGGCCAGGCCTGGTTCCATGA 

GCAGGQCTCCTCGTGaCCTGGCCCAGGGGTCTCTTCCCCTOCrcCTCAGTm 

TTreGATTTTmATTGTTAnAAACTGATGGGACnTTGTGTTm 

54706 TACTTTCAGAGCCCCCCCCGGGGCCGCAGGAGAGGGCCCGGGCTGGGCGGATGATGAGGG 
CCCAGTGAGGCGCCAAGGGAAGGTCACCATCAAGTAT6ACCCCAAG6AGCTACGGAAGCA 
CCTCAACCTAGAGGAGTGGATCaGGAGtmrCACGGGCCTlCTACGACTGCCAGGAA^ 
GGAGATCTCAGAACTAGAGAnGACCTQGATGAGCTCCTGGACATG^ . 
CTGGGCnCCAGGGTCAAQGAGCTGCTGGTTCACTGnACAAACCCACA^ 
[T.C] 

TCTGGCCTGCT66ACAAGATCCGGGCCATGCAGAAGCTGAGCACACCCCAGAAGAAGTGA 

GGGTCCCCGACa^AGGCGAACQGTGGCTaCATAQGACAATCGaACCCCCCGACCTCGT 

AGCAACAGCAATACCGGGGGACCCTGOSGCCAGGCCT GGTrCC ATGAGCAGG GCTCC^^^ 

TGCCCCTGGCCCAGGGGTCTCTTCGCCTCCCCttTCAGrrrrCW 

AnGnAnAAACTGATGQGACmGTGmTOTATTGACTCTGCGQCACQGGCCCTTT 
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54712 CAGAGCCCCCCCCGG6GCCGCAGGA6AQGGCCCGGGCTGGGCGGAT6ATGAGGGCCCAGT 
GASGCGCCMGGGAAGGTCACCATCAAGTTATGACCCCMGGAGCTACGGAAGCACCTCAA 
aTAGAGGAGTOGATCCTCGAGCAGCTCACGCGGCTCTACG^^ 
CTCAGAACTAGAGATTGACGTGGATGAGCTCCTGGACATGGAGACTGACGATGCCTGGQC 
TTC(mGTCAAGGAGCTGCTGGTTGACTGnACAAACCCACAGAG(K:cnCATC^ 
CT.C] 

CTGCTQGACAA6ATCCGGGCCATGCAGAAGCTGAGCACA(X:CCAGAAGAAGTGAGQGTCC 

CCGACCCAGGCGAACGGTGGCTaCATAQGACMTCGCTACCCCCCGACCTCCTAGCAAC 

AGCAATACCGGGG6ACCCTGC6GCCAG6(XTGGTTCCATGAGCAGG GCTCCTC6 TGCCCC 

TGQCCCAGQGGTCTCnCCCCTGCCaCTCAimrCCAmTTGGWTTTm 

AnAAACTGATGGGACmGTGTTTrTATATTGACTCTGCGQCACGGGCCCTrTAATAAA 

54799 6TATGACCC6\AGGAGCTACGGAAQCAGCTCAACCTAGAGGAGT(^^ 

CACGCQCCTCTACGACTGCCAGGAAGAteAGATCTCAGAACTAGAGATTGACG^ 
GCTCCTGGACATGGAGASTGACGAtQCCTGGGCTTCCAGGCTCAAQGAGCra^ 
CTGTOCAAACCCACAGAGGCCTrCATCTCTGQC(rrGCTCGA(y\AGATCCG^ 
GAAGCTGAGCACACCCCAGAAGAAGTGAQGGTCCCCGACXCAGGCGAACGGTGGCTCCCA 

[T.C] 

AGGACAATCGCTACCCCCCGACCTCCTAGCAACAGCAATACCGGGGGACCeTGCGGCCAG 
GCCTGGtTCCATGAGCAGGGCTCCTCGTGCCCCTGGCCCAGGGGTCTCnCCCC TGCCCC 
CTCA6TmCCACTm6GATrrnTTATTGTTATTAAACTGATGGGA(^ 1 1 1 1 
ATATTGACTCTGCGGCACQGGCCCmAATAMGCGAGGTAGGGTACGCCTTrGGTGCAG 
CTCAAAA/V\AAAAAAAAAMTCAmeCAQtG6TCCACAmGAG^ 

54819 GGAAGCACCTCAACCTAGAQSAGTGGAT(XTGGAGGAGinT:ACGCa^^ 

AGGAAGAGGAGATCTCAGMCTAGAGAnGACCTGGATGAGCTCCTGGACAT^ 

ACGATQCCTGGGCTTCCAGGGTCAAGGAGCTGCTCGTTGACTGTTACAAACCCACAGAGG 

CCnCATCTCTGGCCTGCTGGACAAGATC(mCATGCAGAAG™a:ACAa^^ 

AGAACTGAGQGTCCCCGACCCAGGCGAACfiGTGGCTCCCATAQGA(yVATCGC^^ 
[G.A] 

ACCTCGTAQCAACAGCAATACCGGGQGACCCTQCGGCCAGGCCTGGTTCCATGAGCAGGG 
CTCCTCGTGCCCaGGCCCAGGGGTCTCTTaC CTGCCCCC TCAGTTTTOACTTTT^ 
TTTTmATTGmTTAMCTGATGGGACTITSTGTnTTATAnGACTCTBCGGCACGG 
GCCCTTTAATAAAGCGAGGTAGGGTACGCCmGGTGCAGCTCAAAAAAAAAAAAAAAAA 
TGAmCCAGCGGTCCACATOGAGTTlSAMWCTGGTGGGAGAATCTATACCnGTT 

55499 nGTmCTAATACacnGTCAnaAAATATCmAATITATTAAAAAATATATATAT 
ACAGTATT6AATQCCTACT6TGTGCTAGGTACAGTTCTAAACACTTGGGTTACAGCAGCG 
M(yW\ATAAAG6TGCTtACCCTCATAGAAGATAGATT(nAGCATGGr^^^^ * 
ATACAGTAGATACAATAAGTAAACTATATTIGMTATOGAATGTGGCAGATGCTAT^ 
AAAGAGTCAAGACAAGtAAAGACGAnGnCAGQGTACCAGTTGCAATTTTAAATATGGT 

rc Tj 

GTCAGAGCAGGCCTCACTGAGGTGACATGACATTTAAGCATAAACATGGAGGAGGAGGAG 

TAAQCCTCAGCT6TCmGGCTTCCGQGQCAGCa\AGCCATTTCCGTGGCACTAGGAGC^ 

TGGTGTTTCCGAnCCACCTTTGATAACTGCATTTTCTCTAAGATATGGGAGGGAAGTn 

nCTaTATTGTTmAAGTATTAACTCCAGCnVVGTCCmCTTGnATAGTGn^ 

ATCmATAGCAAATATATGAGGTACCGGTAAWmTGCCCAmCTCACAGAGGCACT 
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56825 ACTGATGGCTCAAAGGGT6T6AAAAAGTCAGTGATGCTCCCCCTTTCTACTCCAGATCCT 

GTCCTTCCTGGAGCMGGTTGAGGGAGTAGGTTTTGAA^^^ 

ACAGGCCAGGACnTAGAGAAAGGGCTGGCmTGTtTACCTaTCACTGGCTCT^ 
CCCAGGG/lCCACATCMTGreAi3AGGMGC(TCCACC^^^ 
GAGACTQGCTCAGAACmCQGACAACATCCmCTCTCTCAAAC^ 
CCA] 

AfiGAAGAGGaGQGGGACTAGAAAGAGGCCCTGCCCTCTAGAAAGCTCAGATCTTGGCTT 

CTGTOCTCATACTCGGGTGGGCTCCTOGTCAGATGCCTAAAACAT^ 

CGATGGGTTaGGAGGACAGTGTQGCTTGTCACAGGCCTAGAGTCTGAGGGAGGGGAGTG 

GGAGTCTaQCAATCTCnG(n^CTTCGCTTCATGQCAACCACTGCTCACCC^ 

CCTGGTTOQGCAGCAGCTTimTGGGAAGAGSTGGTGGCAGAGTCTeA^ 

58871 CGTCACCCACCACCCAACCCCTGGCGCACTCCAGCCmAA(^^ 

CATmAACTACOPCCACCnGGAAACAATtGCTGiWGGGGAGAGGAm 

ACCACCTTGnGGGACGCCTQCACACCTCTCTTTCCTGGTTCAACCTGAAAGATTCCTGA 

TGATlGATAATCtG6ACACAGAAGCCGGGCACGGTGG(rraAGCCTGTAATCT(7^GCA^ 

TGGGAGGCCTTMGCAGGTGGA71CAeC7GAGATCAAGAGlTrGASAA(y\GCCT^^ 

r.A] 

GGTGAAACCCCCTCTCTACTAAAAATACAAA^^ 

TAATCCCAGCTACrrCTGGAGGCTGAimGGAGAATCGCTTGAACCCAC^ 

TGCACTGAGaGAGATCATGCCAHGCACTCCAGCCTCTGCAACAAGAGCCAAACTC^ 
CTCAAAAAAAAAAA " - ■ ■ 
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ISOLATED HUMAN KINASE PROTEINS, are characteristic of that subdomain and arc higjily cod- 

NUCLEIC ACID MOLECULES EJ4CODING served (Haidic, G. and Hanks, S. (1995) The Protein Kinase 

HUMAN KINASE PROTEINS, AND USES Facts Bo<^, Vo! 1:7-20 Academic Press, San Diego, Calif.). 

THEREOF second messeoger dependent protein kinases prima- 

5 rily mediate the effects of second messengers such as cyclic 
FIELD OF THE INVENTION AMP (cAMP), cyclic GMP, inositol triphosphate, 

. phosphatidylinositot 3,4,54riphosphal<^ <ycHc-ADI^^ 

. The present invention is in the field ofkinase proteins that arachidohic add, diacy^yccrol and cakM^ 
are related to the serineAhreonine kinase subfamily, recom- Jhe cyclic-AMP dependent protein kinases , (PKA) are 
binant DNA molecules, and protein production. The present important members of the STK family. Cyclic-AMP is an 
invention specifically provides novel peptides and proteins intraoelltilar mediator of hormone action in all prokaiyptic 
that effect protein phosphorylation and nucleic acid mol- animal ceUs that have been ^udied. Siidi liormcme- 

ecules enoodirig sudi peptide and protein moleoiles, all of induced cellular responses include thyroid / hormone 
vAuch arc uscfiU in the development of human therapeutics secretion, Cortisol secretion, progesterone semtioh, gjyco- 
aod diagnostic compositions and methods. . gen breakdown, bone resorption, and regulation of heart irate 

and force of heart muscle contraction. PKA is found iii all 
BACKGROUND OF THE INVENTION ^^^^^ cells and is thought to account for Ac effects of 

. cycUc-AMP in mo^ of these cells. Altered PKA ejqir^ 

Protein Kinases isimpUcated ina variety of disorders and diseases incto 

Kinases regulate many different cell proliferation, jo cancer, thyroid disorders diabetes, ath^rosclcros^; and cm 
differentiation, and signaling processes by adding phosphate diovascular disease (Isselbachcr, K. J. et aL (1994) flam- 
groups to proteins. Uncontrolled signaling has been impli- sm's Principles cf Interna! Medicine, McGraw-HSll, New 
cated in a variety of disease conditions including York, N.Y, pp. 416-431, 1887). : ' . " 

inflammation, cancer, arteriosclerosis, and psoriasis. Calcium-calmodidih(Ca\Qdq)endent protein kiiiases are 
Reversible protein phosphorylation is the main strategy for 25 also members of STK family. Calmodulin is a calcium 
controlling activities of eukaryotic cells. It is estimated that receptor that mediates many cdcium regulated processes by 
more than 1000 of the lOflOO proteins active in a typical binding to target proteins in response to ^ the bin^dit^ of 
mammdian cell are phosphorylated. The high energy calcium. The principle target protein in these processes is 
phosphate, which drives activation, is generally transferred CaM dependent protein kinases. CaM-kinases are involved 
£rom adenosine triphosphate molecules (ATP) to a particular 3^ in regulation of smooth muscle contraction (MLC ^nase), 
protein by protein kinases and removed firom that protein by glycogen breakdown (phosphorylase kinase), ' arid neu^ 
protein phosphatases. Phosphorylation occurs in response to rotransmission (CaM kinase I and CaM Idriase II). CaM 
extraciellular signals (hormones, neurotransmitters, growth kinase I phosphorylates a variety of substrates including the 
and differentiation factors, etc), cell cycle dieckpoint^ and neurotransmitter related proteins synapsin I and II, the gene 
environmental or nutritional stresses and is roughly anab- 35 transcription regulator, CREB, and the cystic fibrosis con- 
gous to turning on a molecular switch. When the switch goes ductance regulator protein, CFTR (Hanbabu, B. et aL (1995) 
on, the appropriate protein kinase activates a metabolic EMBO Journal 14:3679-^6). CaM II kinase also pfaospho- 
enzyme, regulatory protein, receptor, cytoskeletal protein, i lylates synapsin at different sites, and ppntrbis the synthesis 
ion channel or pump, or transcription factOL \)f catecholamines in the brain through phosphorylatioii and 

TTic kinases comprise the largest known protein group, a 40 activation of tyrosine hydroxylase. Many of the CaM 
supcrfamQy of enzymes with widely varied functions and kinases are activated by phog)horylation in addition to 
specificities. They are usually named after their substrate, binding to CaM. The kinase may autqihoqjhorjdate itselt or 
their regulatory molecules, or some aspect of a mutant be pho^horylated by anotter kinase as part of a 'Icmase 
phenotype. Wth regard to substrates, the protein kinases cascade", . ! 

may be roughly divided into two groups those that phos- 45 Another ligand-activatcd protein kinase . is 5*-AMP- 
phorylate tyrosine residues (protein tyrosine kinases, PHQ activated protein kinase (AMPIQ (Gao, G. et al. (199d) /. 
and those that phosphorylatc serine or threonine residues Biol Chem. 15:8675-^1). Mammalian AMPK is a regulator 
(serineAhreonine kinases, STIQ. A few protein kinases have of fatty acid and sterol synthesis through pbo^horylation of 
dual specificity and phosphor^ate threonine and tyrosine the enzymes acetyl-CoA carboxylase and 
residues. Almost all kinases contain a siniilar 250-300 50 hydroxymethylglutaryl-CpA reductase and mediates 
amino add catalytic domain. The N-tcrmial domain, whii± responses of these pathways to cellular stresses such as heat 
contains subdomains I-TV, generally folds into a two^Iobed shock and depletion of glucose and ATP. AMPK is a 
structure, which binds and orients the ATP (or GTF) donor heterotimeric complex comprised of a catalytic aJ^ha sub- 
molecule. The larger C terminal lobe, which contains sub- unit and two non<atalytic beta and gamma subunits that are 
domains VI A-XI, binds the protein substrate and carries out 55 believed to regulate the activity of the a^ha subuniL Sub- 
the transfer of the gamma phosphate from ATP to the . units of AMPK have a much wider distribution in non- 
hydroxyl group of a serine, threonine, or tyrosiiie residue. lipogenic tissues such as brain, heart, spleen, and lung than 
Subdomain V spans the two lobes. esqpccted. This distribution suggests that its role may extend 

Tie kinases may be categorized into families by the beyond regulation of Upid metabolism alone, 
different amino add sequences (generally between 5 and 60 The mitogen-activaed protein kinases (MAP) are also 
100 residues) tocated on either side ot or inserted into loops members of the STK family. MAP kinases aJso regulate 
of, the kinase domain. These added amino acid sequences intracellular signaling pathways. They mediate signal trans- 
aOow the regulation of eadi kinase as it recognizes and duction firom the cell sur&ce to the nucleus via phosphory- 
interacts with its target protein. The primary structure of the lation cascades. Several subgroups have been' identified, and 
kinase domaiis is conserved and can be further subdivided 65 each manifests different substrate specifidties and responds 
into 11 subdomains. Each of the 11 subdomains contains to distinct extracellular stimuli (Egan, S. E.'and Weinberg, 
specific residues and motifs or patterns of amino adds that R A. (1993) Nature 365:781-783). MAP kinase signaling 
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pathways are present in mammalian cells as well as ID yeast FIG. 2). UMK proteins generally have serine/threonine 

The extracellular stimuli (hat activate mammalian pathways kinase activity. The protein of the present invention may be 

include epidermal growth (aiCtor (EGF), ultraviolet light, a novel alteraative splice Conn of the. art-known protein 

hyperosmolar medium, heat shock, endotoxic lipqpolysac- provided in Genbank gi805161 ; however, the structure of 

charide (LPS), and pro-inflanmiatoxy cytokines such as 5 the gene provided by the present invention is different bom 

tumor necrosis factor (TNf) and interleukin-1 (IL>1). the art-known gene of gi^l618 and the first exon of the 

\ PWC (proliferation-related kinase) gene of the present invention is novel, suggesting a novel 

indudblc STK that is involved in regulation of the cell cyde 'g«°« ^ alternative splice form, Fuithennore, the 

and ceU proliferation in humaii mcgakaro'ytic cells (li, B. et protein of the present inventwn lades an LIM domain 
aL (1996) y. BioL Chem. 271:19402-«). PRK is related to ^0 relative to gi8051618. The protein of the present invention 

the polo (derived from humans polo gene) family of STKs docs conUin the kinase catalytic domaia . ... ; ; ,J 

implicated in cell division. PRK is downregiilated in hmg Approximately 40 UM proteins, nained for the UM 

tumor tissue and may be a proto-oncogene whose deregu* domains they contain, are known to exist in eukaiyotes. LIM 

lated expression in normal tissue leads to oncogenic trans- domains are conserved^ cystein-ricfa structures that contain 2 

formation. Altered MAP kinase expression is implicated in zinc fingers that are thougjit to modulate protein^rotein 

a variety of disease conditions ''including cancer, interactions. LSMKl and LIMK2 are members of a UM 

inflanunation, immune disorders, and disorders affecting subfamily characterized by 2 N-tertninal LIM domains and 

' growth and development a C4erminal protein kma^ domaia LIMKl and IJMK2 

The cyclin-dependent protein kinases (CDKs) are another mRNA c35)ression varies greatly between different tissues. 
. group of STKs that control the prpgrcsioii of cells through ^ The protein kinase domains of UMKl and LIMK2 contain 

the cell cycle. Cyclins are small regulatdry proteins that act a unique sequence motif comprising Asp-Leu-Asn-Ser-His- 

by binding to and activating CDKs that theii liigger various Asn in subdomain YIB and a strongly baac insert between 

phases of the cell cycle by phosphorylating and activating subdomains Vn and Vni (Okano ct aL, X BioL Chem. 270 

selected proteins invohred in the mitotic process. (CDKs are (52). 31321-31330 (1995)). Hie protein kinase domain 
unique in that tHey require multqjle inputs to become ^ present in UMKs is significanfly different than o&er kiriasiB 

acavated. In ad^on to the binding of cjcl^^ domains, sharing abom 32% identity. . - : 

tipn requires the phosphorylation of a specific threonine LIMK is activated by ROCK (a downstream effector of 

residue and the dephosphorylation of a specific tyrosine . Rho) via phosphorylation. UMK then phosphorylates 

residue. cofilin, which inhibits its actin-depolymerizing activity, 
Protein tyrosine kinases, PTKs, spcdfically phosphory- ^ thereby leadiiig to Rfao-induced reorganization of the actin 

late tyroane residues on their target proteins and may be . cytoskelcton (Maelawa et pL, $cience 285: 895^898, 1999). 

divided in to , traDsmetnbrane, receptor ' PTKs and The LIMK2a and LIMK2b alternative transact forms are 

nontransmenibrane, non-receptor PTl^ Transmembrane differentially expressed in a . tissue-specific manner aikl m 

prptcin-tyrosine kinases are receptors for most growth fac- generated by variation in transcriptional initiation utilizing 

tors. Binding of growth factor to the receptor activates the alteraative promoters. LIMK2a contains 2 LIM domain^ a 

transfer of a pho^hate group from AIP to selected tyrosine PDZ doinain (a domain that functions in protein-protein 

side chains of the receptor and other specific proteins. interactions targeting the protein to the submetnbrabous 

Growth factors (GF) associated with receptor PTKs indude; compartment), and a kinase domain; whereas LIMK2b just 
epidermal OF, platelet-derived GF, fibr^last GF, hepatocyte ^ has 15 LIM domairis. Alteration of LIMK2a and LIMK2b 

GF, insulin and insulin-like GFs, ricrve.'GF, vasoilar cndot- regulation has been observed in some cancer cell lines 

helial GF, and macrophage colony stimulating factor. (Osada et al., Bidchem. Biophys, Res, Commm, 229: 

Non-receptor PTKs lack transmembrane regions and, 582-589,1996). 

instead, form complexes with the intracellular regions of cell For a further review of LIMK proteins, see Nomoto et at, 

surface receptors. Such receptors that fiinction through non- 45 Gene 236 (2), 259-271 (1999). . 

receptor PTKs indude those for cytokines, hormones Kinase proteins, partictdarly members of ^e~ serine/ 

(growth hormone and prolactin) and antigen-specific tecep- threonine kinase subfamily, are a major target for drug 

tors on T and B lymphocytes. - action and development Accordingly, it is valuable to , the 

Many of these PTKs were first identified as the products field of pharmaceutical development to ideritify arid char- 
pf mutant oncogenes in caricer cells uiiere their activation 50 acterize previously unknown members of this subfamily of 

was no tonger subject to. normal oellular^dontrols. In fact, kinase proteins..The present invention advances the state of 

abdut one tMrd of the known oncogeiies encode FIKs, and . . the art by providiiig previously utiidentified burnan lanase. 

it is well known that cellular transformation (oncogenesis) is proteins that have' homology to members of the seritie/ 

often accompanied by increased tyrosine phosphorylation threonine Idnase subfamily, 

activity (Carbonneau H and Tonks NK (1992) Aww. Rev, 55 , w a ^r. ^tr> ri.r^rr>^t^r^^^ ' 

CeU bU 8:463-93). Regulation of TOC acl^^ may Z SUMMARY OF THE INVENTION ~ . . 

therefore be an important strategy in ooritrolling soine types The pr^nt invention is based in part on the identification 

of cancer: . of arnino add sequences of human kinase peptides 1^ 

, proteins diat are related to the scrinWthrcohine kinase 

UM Domain Kmascs * „ ^ subfejnfly; as well as allefic variants and other marmiidian 

The novel human proteiri, arid encodirigg^ne, provided by orthologs thereof These tmique peptide sequettces, 'and 

the present invention is related to the family .of serine/ . nucleic add sequences that enoocfe these pcptidei^'can be 

threonine kinases in general, particularly^ LiM dornain iised as models for the development of fatiman ti^ 

kinases (LIMK), and shows the highest degree of similarity targets, aid in the identification of therq>eutic proteins, and 

to LIMK2, and the LIMK2b isofom (Genbank i^8051618) £5 serve as targets for the development of human therapeutic 

in particular (see the amino acid sequence aligtiment of the agents that modulate kinase activity in cells and tissues that 

protein of the present invention against LIMK2b provided in express the kinase. Experimoital data as provided in FIG. 1 
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indicates expression in humans in teralocardnoma, ovary, members of this family of proteins and proteins that have 

testis, nervous tissue, bladder, infant and fetal brain, and e^qpression patterns similar to that of the present gene. Some 

thyroid gland. of the more specific features of the p^tides of the present 

invention, and the uses thereof are described herein, par- 
DESCRIFnON OF THE RGURE SHEETS 5 ticularly in the Background of the Invention and in the 

na.l provides the nucleotide sequence of a cDNA aimotationprovidc^^ 

molecule Lt Encodes the kinase p^tein of the present the art for each of the kno^^^ 

invention. (SEQ ED NO;!) In addition, stnxcture and fimc subfamily of kmase proteins, 

tional information is provided, sudi as ATG start, stop and Specific Embodiments 
tissue distribution, where available, that allows one to 

readily determine specific uses of inventions based on this Peptide Molecules * ; 

moleailar sequence. Emerimental data as provided in FIG. ^ . i : , . ^ 

1 indicates expression in humans in leratocardnoma. ovary, presenl mvention provides nucleic aad . sequences 

testis, nervous tissub, bladder, infant and fetal brain, and «n«>de protein molecules that have been identified as 

thyroid eland. * ' 15 being members of the kinase, family of proteins and are 

Vt^ 'J .u ™M ™.-™^f.K- related to the serine/threonine, kinase subfamily (protein 

FIG. 2 provides the predicted ammo acid sequence of the :« ^ t.^^^^^^fLnhi a 

kinase of Sie present invention. (SEQ ID N0:2) In addition *^<^ ^'TA^'Ln , a * 

■ J & !l" ^ sequences are provided m FIG. 1 and geoonuc sequences are 

structure and function^ mformation such as protem famfly. ^ ^ ^ ^ 

funcUon,andmodificaUonsitesispro^^^ ^ 1 as weU as the oUus v^antsdL^ 

aUowmgonetoreadaydeteimmespecificus^ofmvento^ ^ ^^^^ ^^^^ ^ ^^^^ herein and usLg the 

based on this molecular sequence. ^ information in HG. 3, wiU be referred herein as the kinase 

HG. 3 provides genomic sequences that ^an the gene peptides of the present invention, kinase peptides, or 

encoding the kinase protein of the present invenUon. (SEQ pcptide5/i)roteins of the present invention. / ; , . .. 

ID N0:3) In addition structure and functional information, « . ^ 'j • 1 * j *-j j ' 

I . i 1. ^ VI * '1 The present mveotion provides isolated peptide and pro- 
such as mtron/exon structure, promoter location, etc- is . *, , ^. ^ • * r ■ • * ^ « r 

ouwu oa ou * „ , tem ffioleoiles that consist ot consist essentially o^ or 

provided where available, allowmg one to readfly detemune . . ^ f*t. I-j 

i/iw»iu«i a atiaui , iw 1 1 compHse thc amuK) acid sequences of the kmase pepUdes 

specific uses of mvenuoos ^^ed on^thf^^^^^^^^ aiJ^ tf,e FIG. ^encoded by the nuclefc acid 

s«iu«ce. As aiustaited in RG. 3. SNPs were idenUfied at ^^^^ in HG. 1. transcripVcDNA or HG. 3. 
42 different nucleotide positions. 30 genomic sequence), as weU as aU obvious variants of these . 

DETAILED DESCRIPTION OF THE peptides that are within, the art to make and use. Some of 

INVENTION . these variants arc described in cfe^il betow. ■ / ^ 

As used hierein, a peptide is said to be "isolated" or 

General Description "purified" when it is substantially free of cellular material or 

The present invention is based on the sequencing of the ft« of chemicdprca«soRor6ther«A«^ 

humanUme-DuringthesequencingandasseniW^ °LS7e^^' rpSr^h^l^S^pSiSSSre: 

human genome, analysis of the sequence information omw acgreca ul puiuy, lus^^^f^t vL ^i^u^^^ixk^u ^u.^ u 

revealed previoiky unidentified fragiSents of the human based on the mtended use. Tbe cnUcal featmc is that^tiie 

genome that encode peptides that share structural and/or 40 P^P^^ation aUows for the desired function of the pepdde, 

Liuence homology to irotein/peptide/domains identified ^^en if the p^nce of considerable amounts of other 

and characterized^^ the art Tbeing a kinase protein or . coinponents (the feamres of an isolated nucleic acid moL 

part of a kinase protein and are related to the serine/ cculc is discussed betow). • : . ^ . ■ 

threonine kinase sub&mily. Utilizing these sequences, addi. In some uses, "substanUally firee of celhilar material 

tional genomic sequences were assembled and transcript 45 pr^arations of the peptide having l«s than about 

and/or cDNA sequences were isolated and diaracterized. 30% (by dry weight) other proteins (ix., contaminating 

Based on this analysis, the present invention provides amino protein), less than about 20% other proteiiB, less than about 

add sequences of human kinaie peptides and proteins that 10% other proteins, or less than about 5% other protci^. 

arc related to the serine/threonine kinase subfamily, nucleic When the peptide is recombinantiy produced, it can also be 

acid sequences in the form of transcript sequences, cDNA 50 substantiaUy free of culture medium, ix., culture medium 

sequences and/or genomic sequcnas that encode these represents less than about .20% of the vohtmc of jprotein 

kinasb peptides aind proteins, nucleic acid variation (allelic . preparation. f^' 

informationX tissue distribution of expression, and informa- The language "substantially free pf chemical precursors 

tion about the closest art known protein^Jcptide/domain that or other chemicals" includes preparations of the, peptide in 

has structural or sequence homobgy to the kinase of the s5 which it is separated from chemical precursors or other 

present invention. chemicak that are involyd in ite synAcs^ 

In addition to being previously unknown, the peptides that embodiment, the language "substantiaUy free of chemical 

are provided in the present invention arc selected based on precursors or other chemicals" indudes preparations of the 

their ability to be used for tiie devcfopment of commercially kinase peptide having less than about 30%. (by dry weight) 

important products and services. Specifically, the present fio chemical precursors or other chemicals, l<ss than about 20% 

peptides are selected based on homology and/or structural chemical precursors or otiier chemicak, less than about 10% 

relatcdncss to known kinase proteins of tiie scrineAhreonine chemical precursors or other chemicals, or l«s than about 

kinase subfamily and the expression pattern observed. 5% chemical precursors or othw cfaemcals. '- -^^^^ 

Experimental data as provided in FIG. 1 indicates expres- The isolated kinase peptide can be purified froni Cells that 

sion in humans in teratocarcinoma, ovary, testis, nervous 65 naturally express it, purified from cells that 'have been 

tissue, bladder, infant and fetal brain, and thyroid gland. The altered to e^q^rcss it (recombinant), br synthesized iising 

art has clearly e^abli^ed the commercial importance of kno^ protein synthesis methods.' Experimental data as 
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provided in FIG. 1 indicates expression in humans in A chimeric or fusion protein can be pnxhiced by staodaxd 

teratocarcinoma, ovary, testis, nervous tissue, bladder, infant recombinant DNA techniques. For example, DNA fragments 

and fetal brain, and thyroid gland. For example, a nucleic coding for the different protein sequences are tigated 

add molecule encodirig the Idnase peptide is cloned into an together in-frame in accordance with conventional tech- 
expression vector, the expression vector idtroduced into a 5 niques. In another embodiment, the fiision gene can be 

host cell and the protein expressed in the host cell The synthesized by conventioiial techniques Including auto- 

. protein can then be isolated fiom the cells by an appropriate ■ mated DNA synthesizers. Alternatively, PCR amplification 

purification scheme using standard protein purification tedi- of gene fragments can be carried out using anchor primers 

niques. Many of these techniques are described in detail which give rise to complementary overhangs between twp 

below. consecutive gene fragments which can isut^equently be 

Accordingly, the present invention provides proteins that annealed and re-amplified to generate a cfaiineric gene 

consist of the amino add sequences provided in FIG. 2 (SEQ sequence (see Ausubel et al.. Current Protocols itt Molecu- 

ID N0:2), for example, proteins encoded by the transcript/ lar Biology , 1992). Moreover, many expression vectors are 

cDNA nucleic acid sequences shown in FIG. 1 (SEQ ID commerdally available that already encode a fiision moiety 

NChl) and the genomic sequences provided in FIG. 3 (SEQ (e.g., a GST protein). A kinase peptide-enooding nticleic 

ID N0:3). The amino add sequence of such a protein is add can be cloned into such an expression vector 'such that 

provided in FIG. 2. A protein consists of an amino add the fusion moiety is linked in-frame to the kinase peptide, 

sequence when the amino acid sequence is the final amino As mentioned aboye, the present invention also prp^odes 

add sequence of the proteirL and enables obvious variants of the jumino acid ise^Uebce 'of 

The present invention further provides proteins that con- 20 proteins of the present invention, such 'a^ naturally 

sist essentially of the amino add sequences provided in FIG. occurring mature forms of the peptide, 'aUelic/sequonce 

2 (SEQ ID N0:2), for example, proteins encoded by the variants of the peptides, TOU-naturaOy oooirri^ recoihbi- 
transcript/cDNA nucleic add sequences shown in FIG. 1 nantly derived variants of the peptides, and brtholog^ and 
(SEQ ID N0:1) and the genomic sequences provided in FIG. paralogs of the peptides. Such variants can . reactily be 

3 (SEQ ID N0:3). Aprotein consists essentially of an amino 25 generated using art*known techniques in the fields of reoom- 
add sequence when such an amino, acid sequence is present binant mxcleic add technology and protein blochem^try. It 
with only a few additional amiiso acid residues, for example is understood, however, that variants exdiide any amino acid 
from about 1 to about 100 or so additional residues, typically sequences disclosed prior to the invention. - * ' ' 
from 1 to about 20 additional residues in the final protein. Such variants can readily be identified/made using 

The present invention further provides proteins that com- 30 molecular techniques and the sequence irifbrmation dis- 
prise the amino acid sequences provided in FIG. 2 (SEQ ID closed herein. Further, such variants can readily be i£stin- 
N0:2), for example, proteins encoded by the transcript/ guished from other peptides based on sequence and/or 
cDNA iiucleic add sequences shown in FIG. 1 (SEQ ID structural homology to the kinase peptides of the present 
N0:1) and the genomic sequences provided in FIG. 3 (^EQ invention. The degree of hoinology^dentity present will be 
ID N03). A protein comprises an amino acid s^uence 35 based primarily on whether the peptide is a fuixrtional 
when the amino add sequence is at least part of the final variant or non-functional variant, the amount of divergence 
amino add sequence of the protein. In sudi a fashion, the present in the paralog family and the evohitibnaiy distance * 
protein can be ooly the peptide or have additional amino add between the oithologs. - : • . • 

molecules, sudi as amino acid residues (contiguous encoded To detemiine the percent identity of two ' amino acid 
sequence) that are naturally associated with it or heterolo- 40 sequences or two nucleic add sequences, the sequences are 
gous amino add residues/peptidc sequences. Such a protein aligned for optimal comparison purposes (e.g., gaps can be 
can have a few additional amino add residues or can introduced in one or both of a first and a sgcnnd a mt'n n jgnH 
comprise several hundred or more additional amino adds. or nucleic acid sequence for optimal aligimient and nbn* 
'The preferred classes of proteins that are comprised of the homologous sequences can be disregarded for comparison 
kinase peptides of the present invention are the naturally 45 purposes). In a preferred embodiment, at least 30%, 40%, 
occurring mature proteins. A brief description of how van- 50%, 60%, 70%, 80%, or 90% or mote of the length of a 
ous ty^cs of these proteins can be made^solated is provided reference sequence is aligned for comparison purposes. The 
^low. amino add residues or imcleotides at corre^iicling arnino 

The kinase peptides of the present invention can be add positions or nucleotide positions are then cothpared. 
att^hed to heterologous sequences to form chimeric or 50 When a position in the first sequeiice is occupied by the 
fusion proteins. Such chimeric and fusion proteii^ comprise . same aminp add residue or nucleotide as the opnespondiqg 
a kinase peptide operatiyely linked to a heterologous proteiii . position, in the second, seqiieiice, then the molecules are 
having an amino acid sequence not substantially homolo- identical at that position (as used bieiein 'amiiio add or 
gous to the kinase peptide. ''Operatively . linked" indicates nucleic add "identity" is equivalent to amino add or nudeic 
that the kinase peptide and the heterologous protein are 55 add ''homology^. The percent identic between tiie two 
fused in-frame. The heterologous protein can be fiised to the sequences is a function of the number of identical positions 
N-terminus or C-teminus of the kinase peptide: shared by the sequences, taking into account the number of 

In some uses, the fusion protein does not afifect the gaps, and the length of each gap, which need to b6 intro- 
activity of the kiiiase peptide per se. For example, the fusion duced for optimal alignment of the two sequnicn."' 
protein can include, but is not limited to, enzymatic fusion 60 The comparison of sequences and determination of per- 
proteins, for example beta*galactosidase fusions, yeast two- cent identity and similarity between two sequences can' be 
hybrid GAL fusions, poly-His fusions/ MYC-tagged, accomplished using a mathematicar algorithm^ 
Ifl-tagged and Ig fusions. Sudi fusion prpteins, particularly {Comptttational Molecular Biology ^ Lesk, A.' M.v'-cd.i 
poly-His fusions, can facilitate the purification of recombi- Oxford University Press, New York, 198SyBioc6frqnttmg: 
nant kinase peptide. In certain host ccEs (e.g., mammalian 65 Informatics and Genome Projects^ Smith, D. W.^ iML, Aca- 
host cells), ejqiression and/or secretion of a protein can be demic Press, New York, 1993; Computer Analysis cf 
increased by using a heterologous signal sequence. Sequence Data, Part 1, GriflGn, A. M., and GrifSn, H. G.i 



us 6340^83 Bl 
9 10 

cds^ Humana Press, New Jersey, 1994; Sequence Anafysis the proteins) have significant homology vAitn the amino' 

in Molecular Biology, von Heinje, Academic Piess, add sequences arc typically at least about 70-^80%, 80-^0%, 

1987; and Sequence Analysis Primer, Gribskbv, M. and and more typically at least about 90-95% or more homolo- 

Devercux, J., cds., M Stockton Press, New York, 1991). In gous. A significantly homologous amino acid sequence, 

a preferred embodiment, the percent identity between two .5 according to the present invention, will be encoded by a 

aniino add sequences is determined using bic Keedleman nucleic add sequence that will hybridize to a kinase peptide 

and Wunsch (/. MoL Bw?£. (48^:444-453 (1970)) algc»ithm . encoding nucleic acid mblecule under stringent conditions 

which has been incorporated mto the GAP program in the ^ ^q^^ described below. 1 j;: o 

GCG software padcagc (available at ^^P'^^^&^f^^): hG. 3 provides information on SNPs that have been 

using cither a Blossom 62 matnx or a PAM250 matrix, and jq ^^^^^ ^ encoding the kinase protein of the present 

agapweighlof 16, 14,12, 10.8,6,or4andalengAw^^ invention. SNPs were identified at 42 different nucleotide 

of 1. 2, 3, 4, 5, or 6. In yet another preferred embodiment. ^^^^^s. Some of these SNPs. ^di are located outside the 

the percent identity betwwn two nucleoUdc scquen^ is qRF and in introns. may affect gene transcription. > • ■ > 

detennmed usmg the GAP program in the GCG software „ , r 1 • jm 1. -j j 

package (DevercU J, etdrw-^c Adas Jlej.l2(l):387 „ ^ P«alogs of ijanasc ^Udc can wadUy t^^^^ 

(198?)(availabirathUp://^.gegx«m)/iisingiN^^ ha^8 some degree of sigmficant sequeaa 

iapdaa.CMP matrix and a SpSt of 40. SO. 60. 70. or «eDtrty «p at least a ^rtioa of the kmase p^Uje. « bemg 

80 and a length weight of 1, 2. 5. or 6. In another a gene &om humans, and as having simaar 

embodiment.ttepe«^tidenti^^be;««;entwoaminoacidor «<=«"«y or fuacUon T<vo ptotems wfll typ.caUy be c^sid- 

nudeotidesequen^isdetenniiedusmgtherigorithmofE ^ eied para^ whM the amu,o aad sequence a« typ^ 

Myens and W Miller (CABIOS. 4:11-17 (1989)) which has 'Jl't^^"* °'^"T' "it "^k - 

bZn incoqwnited into the AUGN progLi (vi«ion ZO), greater homology lluou^ a gmsn r^on or 

using a PAM120 weight residue table. . gap kng«h P«»lty pmlo^ wiU be enopded by « n«<=l««JJf«' 

fU d 2tvof4* . sequence that will hybndize to a kmase peptide encoding 

^ ^ J 1 * .u* nucleic acid moleaile under moderate to ^ringent condi- 

Tlie nucleic acid ai^protemseqiwnces of the pr^^^^ . Z^ . , 

mvenuon can further be used as a ''query sequence to _ . , ^ ^ .-^ ^ 

perform aseardi against sequence databases to, for example, Orthologs of a lanase pepUde can readily be identified as 
identify other family members or related sequences. Such having some degree of significant sequence homology/ 

searches can be performed using the NBIAST and identity to at least a porUpn of the tanasc peptide 

XBLASTprograms emersion 2.0) of Altsdiul. et aL MoL 30 encoded by a gene from another oigamsm. Prefcrrcd 

BioL 215:403-10 (1990)). BLAST nucleotide searches can orthologs wiU be isolated from mammals, preferably 

be performed with the NBIj\ST program, score-100, primates, for the development of human therapeutic targets ; 

wordlenglh.l2 to obuin nucleotide sequences homologous and agents. Such ortholog wiU be encoded by a nucleic acid 

to the nucleic acid molecules of the invebtion. BLAST sequence that wiU hybridize to a kmase peptide encodmg 

protein searches can be performed with the XBIj\ST 35 '^"^l^l^ molecule under moderate to stnngent 

program. score-SO, wordlength«3 to obtain amino add condiUor^ as more fuUy Ascribed beto^^^ 

sequences homologous to the proteins of the. invention. To degree of relatedness of the two organisrns yielding the 

obtain gapped alignments for comparison purposes. Gapped protems. • 

BLAST can be utilized as described in Altschul et aL NonniaturaUy occurring variants of the kinase peptides of 

(Nucleic Acids Res, 25(17):338^3402 (1997)), When uti- 40 the present invention can readily be generated using reoom- 

lizing BLAST and gapped BLAST programs, the default binant techniques. Such variants include, but are not limited 

parameters of the respective programs (e.g., XBLAST and to deletions^ additions and substihitions in tbe amino acid 

NBLAST} can be used, sequence of the kinase peptide. For example, one class of 

Full-length pre-processed forms, as well as .mature pro- substitutions are conserved amiiio acijd siibstitutipn. ;Such 

ccsscd forms, of proteins that comprise one of the peptides 45 substitutions are those that substitute a given amino acid in 

of the present invention can readfly be identified as having a kinase peptide by another amino acid of like diaracteris- 

complcte sequence identity to one of the kinase peptides of tics. Typically seen as conservative substitotions are the 

the present invention as well as being encoded by the same replacements, one for another, among the al^hatic amino 

genetic tocus as the kinase peptide provided herein- TTie acids Ala, Val, Leu, and De; interchange of the hydroxyl 

gene encoding the novel kinase protein of the present 50 residues Ser and Thr, exdiange of the acidic; residues Asp 

invention is Incited on a genome component that has b6cn and Glu; substitution between the aruKle rcadues Asn a^d 

mapped.to human chromosome 22 (as indicated in FIG. 3), Qln; exchange of the basic residues Lys and Aig; and 

whidi is supported by multq)le lines of evidence, sudi as repUcements among the aromatic residues Phc and T>t. 

STS and BAG map data Guidance concerning whidi amino add changes arc likely to 

AUeUc variants of a Idnase peptide can readfly be iden- 55 ^J^^^^^J^^l^^^ ^^"""^ ^ f >^-"^'^^ 

tified as l>eing a human protein having a high degree 247:1306-1310(1990). . . 

(significant) of sequence homology^dentity to at least a Variant kinase peptides can be fuUy functional or can' ladc 
portion of the kinase peptide as wefl as l)cir^g encoded by the function in one or more activities, e.g. ^biHty ^o bind 
same genetic locus as the kinase peptide provided herein. substrate, ability to phosphorylate substrate, ability to mcdi- 
Genetic locus can readily be determined based on the fio ate dgnaling, eta Fully fimctional variants typically coiit^ 
genomic information provided in FIG. 3, stich as the only conservative variation or variation in nonA^rWpJ resi- 
genomic sequence mapped to the reference human. The gene dues or in non-critical regions. FIG. 2 provides the result of 
encoding the novel kinase protein of the present invention is protein analysis and can be used to identify critical dorxiains/ 
located on a gpnome component that has been mapped to regions. Functional variants can also contain substitutidn of 
human chromosome 22 (as indicated in FIG. 3), which is 65 similar amino adds that result in no change or an'insigiiifi- 
supported by multq)le lines of evidence, such as STS and cant change in function. Alternatively, such sul^itutions 
BAG map data. As used herein, two proteins (or a region of may positively or negatively affect function to some d^ree. 
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Noa*fuocdoiial variants typically coDtain one or more leolytic processing, phosphorylation, prenylation, 

non-conservative amino acid substitutions, deletions, racemization, selenoylation, sulfation, transfer-RNA medi- 

insertions, inversions, or truncation or a substitution, ated addition of amino acids to proteins such as aiginylation, 

insertion, inversion, or deletion in a critical residue or and ubiquitinatioa _ 

. critical region. . . , . . . 5 . Sudi modifications are weU known to tbc^ of skiU in th^ 

Amino acids that are essential for function can be iden- art and have been described in great detail in the scientific 

tified by methods known in the art, such as site-directed literature. Several partiodarly pommon modifications, 

mutagenesis or alanine^^scanning mutagenesis (Cunningham ' glycosylatioo, lipid attachment, stilfation, gamma- 

et aL, Science 244:1081-1085 (1989)X particularly using the carboxylation of glutamic acid residues, hydroxylation and 

results provided in FIG. 2. The latter procedure introduces ADP-ribosylation, for instance, are described in most basic 

single alanine mutations at every residue in the molecule. texts, such as Proteins-rStructure and Molecular 

Ihe resulting mutant molecules are then tested for biological Properties^ 2nd Ed., T. & . Creighton, W. H. Freeman and 

activity such as kinase activity or in assays such as an in Company, New York (1993). Many detailed xeview5,.are 

vitro proliferative activity. Sites that are critical for binding available on this subje<^ stich as by Wold, F., Postrransla- 

partnei/substrate binding can also be determined by struc- tional Ccvalent Modification of Proteins, B.. .C. Johnson, 

tural analysts such as crystallization, nuclear magnetic reso- Ed., Academic Press, Ncv/ York l'r-12 (1983); Seifier et al. 

nancx or photoafSnitylabeHng (Smith et aL,XAft7lB{of. Q4eth EnzymoL ld2: 62&^^6 (1990)) and Rattan et al. 

224:899-904 (1992); de Mis ct aL Science 2S5'3^212 (/inn, NX Acad Sd, 663:4^-62 (199^. : ^ - i. 

(1992)). . Accordingly, the kinase peptides of the present invention 
The present invention further provides fragments of the ^ also encompass derivatives or analogs in wiiidi a substituted 

kinase peptides, in addition to proteins and peptides that amino acid residue is not one encoded by the genetic code, 

comprise and consist of sudi fraginents, particularly those in which a substituent group is included, in whidi the mature 

comprising the residues identified in FIG. Z The fragments kinase peptide is fused with another compound, such as a 

to which the invention pertains, however, are not to be compound to increase the half -life of the kinase peptide (for 
construed as encompassing fragments that may be disclosed ^ example, polyethylene glycoQ, or io which the additional 

publicly prior to the present tnventiph. amino adds are fused to the mature kinase peptide, such as 

As used herein, a fragment comprises at least 8, 10, 12, » leader or secretory sequence or a sequence for purification 

14, 16, or more contiguous amino add residues from a «>f the mature kinase peptide or a pro-protein sequence, 

kinase peptide. Such fragments can be chosen bas(Kl on the Pmf ' /fe Vh ir 

abilily to retain one or more of the bioIogi<al actiyitiw ^ ^ ^ r 

. kinase peptide or could be chosen for the abifity to The proteins of the present invention can be. used in 

a function, e.g. bind a ^bstrate or act as an iimnunpgen. substantial and specific assays related to the functional 

Particularly important fragments are bioIogicaUy active information provided in the Figures; t6 raise antibodies or Io 

fragments, peptides that are, for example, about 8 or inore elicit another immune response; as a reagent (including the 

amino acids in length. Sudi fragments will t)pically com- labeled reagent) in assays designed to quimtitatively deter- 

prise a domain or motif of the kinase peptide,' e.g., active mine levels of the protein (or its bindirig partner or ligand) 

site, a transmembrane domain or a substrate-binding in biological fluids; and as markers for tissues in which the 

domaia Further, possible fragments indude, but are not corresponding protein is preferentially expressed (either 
limited to, domain or ihotif containing fragments, soluble ^ constitutively or at apartioilarstage of tissuedifEerentiation 

peptide fragments, and fragments containing immuoiogenic or development or in a disease state). Where the protein 

structures. Predicted domains and functional sites are readily binds or potentially binds to another protein or ligand (sudi 

identifiable by computer progranis well known and readily as, for example, in a kinase-effector protem interactioii or 

available to those of skill in the art (e.g., PROSITE analysis). kinase-ligand interaction), the protein can be used to identify 

The results of one such analysis are provided in FIG. 2. the binding if>artner/ligaiid so as to develop a system to 

Mypeptides ofteii contain amino adds other than the 20 identify inhibitors of the binding interaction. Any or all of 

amino adds commonly referred to as the 20 naturally these uses are capable of being developed into reagent grade 

occurring ainino adds. Further, many amino adds, including or ki^ format for oommerdalization as commercial products, 

the terminal amino Adds, may be . modified . by natural Methods for performing the uses listed above, arc well 
processes, such as ptxKXssing and other post-translatiorial 50 ^wn to those skilled in the art References disdosing aidi 

modifications, or by dierhical ixtodification tcdiniques well '[[ methods include ^Molecular Cloning: A Laboratory 

known in the art. Comimon modifications that pccur natu- Manual", 2d ed., .Cold Spring Harbor laboratory Press, 

rally in kinase peptides are desqibed in basic texts, detailed Sambrook, J., £. F. Fritsch and T. Maniatis eds., iS^9, and 

monographs, and the research literature, and they are well "Methods in Enzymology: Guide to Molecular Ooning 

known to those of skill in the art (some of these features are 55 Techniques", Academic Press, Beiger, S. L. and A. R. 

identified in FIG. 2). ' ,\ r , : : - Kimmel cds., 1987. • . ' ' 

Known modifications include, but are not limited to, Thepotentialusesof the peptides of the present invention 

aoetylation, acylation, ADP-ribosylation, amidation, cova- are based primarily on the source of the protein as well as the 

lent attachment of flavin, covalent attachment of a heme class/action of the proteiiL For example, kinaseis isolated 

moiety, covalent attachment of a nudeotide or nucleotide 60 from humans and their huinaiiAnammaHan ortbblogs serve 

derivative, covalent attaduient of a lipid or l^id derivative, as targets for identifyirig agents for use in marhmalian 

covalent attachment of jphosphotidylinositol, cross-lic^ng, therapeutic applications, e.g. a human dmg, particularly in 

cyclization, disulfide bond fonnation, demethylatiori, for- modulating a biological or pathological response in a cell or 

mation of covalent crosslinks, formation of cy^e, forma- tissue that expresses the kinase. Experimental data as pro^ 

tion of pyroghitamate, fonnylation, gamma carboxylation, 65 vided in FIG. 1 indicates that the kinase proteins of the 

glycosylatioo, GPI anchor formation, hydroxylation, present invention are expressed in humans in 

iodination, methylalion, myristoylation, oxidation, pro- teratocardnoma, ovary, testis, nervous tissue, bladder, infant 
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brain, and thyroM gland, as indicated by virtual northern blot . transduction such as protein pbo^borylation. cAMP 

analysis. In addition, PCR-based tissue screening panels turnover, and adenylate cydase activation, etc. • 

indicate expression in fetal brain. A large percentage of Candidate compounds indnde, £ar example, 1) peptides 

pharmaceutical agents are being developed that modulate as soluble peptides, includii^ Ig-tailed fusion peptides 

the activity of kinase proteins, particularly members of the 5 members of random pq)tide libraries (see, e.g. Lam et 

serine/threonine kinase subfamily (se^ Badcgrpund of Uie Haum iJ54:8i-84 (1991^ Hoi^n et aLi Natwe 

Jnvenaon). The structural and fiwct^ infiSnnatfra pro- ■ -^^.^^^ (1991)) and ebmbinatorial dieaustry-detived 

vided in the Background and Figures provide qiecific and moleailar libraries .made of D- arid/or Wnfi^tion 

substantial uses for the mole«»ks of the present mvenhon, ^ phosphopeptides (e^, membeis of laniim 

parUcularly in combmahon with the expression mform^n „ ^ degenerate, directed phospta^ libraries, 

provided in HG. 1. ExpenmenUl daU as provided in FIG. Songyang et aL, CeU 72:767-778 (1993)); 3) 

1 indicates expression in humans m teralocarcujoma, ovary. g„tibodies (e.g.. polydonal. monoctonal. hiawniid. anti- 

testfc. nervous tissue, bladder, infant and ^etal ^rain, and yjg,^. chimeric, and single chain antibodies as vwsU as 

thyroid gland. Such uses can readily be detennmed usmg the ^ earasion Kbraiy fiagmenti and epit*^- 
information provided herein, diat which is fcoownm the art, ,j tinj;^ g^nts of antibodies); and 4) smaB oiganic and 

and routine expeiimenution. . inorpmic molecules (e.g, molecules libtained from cdmbi- 

The proteins of the present invention (mdudmg vana^ natorial and natural product libraries), 
and fragments that may have »^"^l;^,P"°f, One candidate compound is a wluble fa&^ 6f the 

present mventwn) are useful for biolo^«l assays leUted to ^,^4 JJ^r substrate binding. Other caiidi- 
ldn.sesthatarerelatedtomembe.softhes^^ « KmS.lShtdem«tamkinasesr^p«,p.iaie;fi^^ 

kmase subfamily. Such assays mvoWe any of to known mutations that affect kinSe fi^dion and 

kinase fundions or actmues ^^' V^^^^^!^ fl,us compete for substrate. Accordingiy. a fragment that 

nosis and treatment of fana?e«lated oonAtions that a« ^^^.s^^atei for example wift a Mgherllffinityi or 

specific for the «ibfi«nily.of ^'^/IV'' *^ .fr^ment that binds s^Astrate bit does not ^^release, is 
p,esentmventionbclongsto,particularIyincelbandb^ „ ^J^Lj^^y the invention. ■ 

that express the kinase. Experimental data as provided m cnoumpaaacu ui lu . " ■ 

So. l^indicates tha IhXase proteins of L present THe invention &rther|ndudes oth«end pmnl assaj« to 

invention are expressed -in humans in teratocardnoma, rfentify compounds that moduhte (^te or.inhftu) 

ovary, testis, nen^us tissue. bUdder. infant brain, and thy- kmase admty. The a^j^^typically mvoWe an «^y of 
toiddand, vindicated by virtual northern blot analysis. In „ fy«nts m the s^ frans<toctmn pathway that ind««t^^ 
addiK^-basedtissuescxeeningpanelsindicateexpres. " kinase «*vjty. Thus, the phosphoq^ition of a substrate, 

sSafafetaltrain ■ activation of a protein, a change in the e:q)ression of genes 

sionmieiai o . ..... « .lix .i—fi.ii in that are upr i>r doira-regulated in ie»onse to the kin^ 

The proteins of the present invention are also usenill in ««v^»j# ... -.j. „_ l27.i~.^/i ■ 

iu« i»uM.u» — protem dependent signal cascade can be assayed, 

drug screenmg assays, m cell-based or cell-free qrstems. . , u- u • 

Cell-based systems can be native, i.e.. cells that nomaUy 3s Any of the biological or biodwrnical fundions medrated 

express the Wnase. as a biopsy or expanded in cell .^ture. by the kinase «n be used as an endpomt assay. ^ These 

Eraetimental data as provided in FIG. 1 indicates expres- include all of the biochemicd or biochemical^iological 

sioTin humans in teratocardnoma, ovary, testis, nervous events described herem. m the references ated herem, 

tissue, bladder, infiint and fetal brain, and thyroid gland. In incoiporated by reference for these endpoml assf^r taigeb. 

an alternate embodiment. eeU-based assays involve recom- 40 and other fimctions kno^ to tli«» of oidm^ 

binant host cells expressing the kinase protein. art or that can be readily identified ^ tte infomiation 

rr^trS'^ifsSc ^eTX^^ t« Exp^rimental^dyta as prov^^m 

JLocSS with the kinase. BoTSe kinases of the present 45 FIG- 1 J^!' 

toSn and appropriate variants and fragments can be invention are expressed » ^^^^"^^ 

Zci in high-thiSuSwt screens to assay idate com- ovary t^ '^'TtSt!*^!^ ^^^^ hKjS^In' 

pounds foftiie ab^fo bind to the kinase. Theie com- >^»?.8»«'^ ^ 

5^uSd^«nbefurtherLeenedagainstafimctionalkinaseto addUionPCT-based tissue soeemng panels mdicatee^qire^ 

determine the effed of the compound on the kinase adivity. so sion " fetal brain. . • .-, - . ^_ 

Further, these compounds can be tested in 4nimai or inyer- Bindmg and/or adivaUi^ compounds can jilso b^ 

tebrate systems to ddermine actiwty/efifectiveness. Com- screened by uang chimeric kmase prot?ms jn which Ihe 

pounds can be identified that activate (agonist) or inactivate amino temiinal extracellular domam, or parts tterco^ the 

(antagonist) the kinase to a desired degree. entire transmembrane domam or subregions, aidi as any of 

Further, die proteins of the present invention can be used 5S ^,^''^'^''f>^'^T°'^''!^i^^l' 

to screen a compound for the ability to stimulate or inhibit hibr or extraceflular toops and the ouboxy termmal irtra- 

Ste^on betv^n ti« kinase protein and a molecule that celhilar domam, or V':^^^^^^^}^^^^ 

normaUy interactswiththekinaseprotein.e.g. asubstrateor otogous domams orinibre^os. For example a sutorate- 

rS«S^ of Sgpal pathway thaVthe^kinasi protein bining «lPon can be used that ^«««f*,^* « 

no3y interacts (f^ e/ample,"^ another kinase)!^ Sud. 60 f^f t°*^S?«.'^?S^2^S^ 

assays typically include tiie steps of combining the kinase Accordmgly, a different set of signal transdud^oompo- 

.proteto ^th a^candidate com^und mider conditions ti«t "'^^^^''^^^^^Sf'^^T^^T^^c 

aUow the kinase protein, or f^nt. to interact with flie allows ^^^y^^^,^^^"^ m other than the specific 

targd molecule. «.d to deted fcTformation of a complex host ceU from whid. Uk kmase is derived, 

between the protein and die target or to deted the biochemi- 65 The proteins of the present mvenUon are also usefiil m 

cal consequence of Uie interaction with Uie kinase protein competition binding assays m methods designed to discover 

and the target, sudi as any of the associated effects of signal compounds that interad with Uw kinase (e.g. binding part- 
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ners and/or ligaads). Thus, a compound , b exposed to a kinase activity m a pharmaceuticd compositioo to a subjed 

. kinase poljrpeptide under conditions that allow the com- in need of sucb treatment, the modulator being identified as 

pound to bind or to otherwise interact with the polypeptide. described heireia 

Sohible kinase polypeptide is also added to the mixture. If in yet another aspect of the indention, the kinase proteins 

the test compound interacts with the soluble kinase 5 can be used as "Ijait proteins" in a. two-hybrid assay or 

polypeptide, it decreases thte ampxint of complex formed or three-hybrid assay (s^ e;g., U:S. Pat^ Nbl 5,283,317; Zer- 

activity from the Mnase target Ihis tyjpe of vos et aL i(1993) C^i/ 72:223-232; Madura et i;^(^^ 

ticdafly useful in cases in 'wMcha)mpouhds are sbu^t that ' J?io£ C^^wl 268;12646^li054; 

interact with specific regions of the kinase. Thus, the soluble ni^ey 14.-920-924; Iwabuchi ct aL (1993) 'Oncogene 

polypeptide that competes with the target kinase region is 8:1693/1696; and Brent WO9411Q300), to identify other 

deseed to contain peptide sequences corresponding to the proteins^ U^cfa bind to or interact with the kiiiase and are 

region of interest involved in kinase activity. Such kinasc-binding proteii^ are 

To pcrforni cell firee drug screening assays, it is some- also likely to be inviplved in the propagation of signals b 
times desirable to immobilize either the kinase protein, or proteins or kinase targets ais, for example, down- 
fragment, or its target molecule to facilitate separation of stream elements of a kinase-mediated sfenaling pathway, 
complexes from uncomplexed forms of one or both of the A^^^^i^^J?' ^ kmase-bmding proteins are Hcely to be 
proteins, as well as to acoonunodate automation of the assay. kiDu^ mhibi tors. ^ " ^ ' . - 

Techniques for immobilizirig proteins on ihitriccs can be J?!*^^^ "^^^ ^ T^^^J^ 

;n th^ «/-r««;«« «««« f « *«k^.w«» • transcrq)tioD factors, ^ch cwisist of sparable DNA- 

used in the drug s^ccmng a^3^ In one^embodunen^ a and activattori domains. Briefly, theassay utilizes 
f^ion protein can be provided which adds a domam that ^ tv3erentDNAconstructs.InonecoLuct,theiene^ 

ano^ the protem to be bound to a matrix. For cxar^le codes for a kimise protein is fused to a^enc^ng the 

ghitathwne^-transferase fusion proteins can te adsorbed DNAbinding domdn of a known transection iSSor(e.g., 

onto ghitathipne sepharose beads (Sigma Chemical, St GAL4). In the other construct, a DNA sequence, from a 

Louis, Mo.) or glutathione derivatized microtitre plates, Kbrary of DNA sequences, that encodes an uniderAified 
which are then combined with the cell lysates (e.g., ^S- 25 protein (*^rey^ or "sample'^ is fased to a gene that codes ^^f^^ 

labeled) and the candidate compound, and the mixture the activation domdn of tte known tiaiBcrq^ 

incubated under conditions conducive to complex formation the *l)ait" and the "prey** proteins arc ible to interact, in 

(e.g., at physiological conditions for salt and pH). Following viyp, formir^ a kinase-dcpendeht ooo^lex, the DNA- 

incubation, the beads are washed to remove any unbound bindiiig and activation domains of the transcrq)tiod factor 
label, and the matrix immobilized and radiolabeldetomined 30 are t>n>ught into dose proximity. This proximity 'allow^ 

directly, or in th6 supemataiit after the complexes are transcrQ)lion of a rqwrter gene (e.g., ^cZ) \(1uch is oper- 

dissociated. Alternatively, the complexes can be dissociated . ably linked to a traniscriptional regulatory site responsiv^ to 

from the matrix^ separated by SDS-PAGE, aiid the level of the transcrq)tion factor. Expression of the reporter gciie can 

kinase-bihding protein found in tbe bead fraction quantitated be detedied and cell colonies containing the functional 
from the gel using standard electrophoretic tedudques. For 35 transaction factor can be isolated and used to obtain the 

example, either the polypeptide or its target molecule can be cloned jgene which encodes the protein whidi interacts with 

immd>ilized utilizing conjugation of biotin and streptavidin the kinase protein. ^' ' / ■ • ' '-. f - v 

' using techniques well known in the art Alternatively, anti- This invinition further pertairis to novel agents idetttified 

bodies reactive with the protein but which do not interfere by the above-described screening assays. Acoordingjly, it is 
with binding of the protein to its target molecule can be 40 within the scope of this invention to forther use an agent 

derivatized to the wells of the plate, and the protein trapped identified as described herein in an appropriate aniinal 

in the wells by antibody conjugation. Preparations of a inodeL For example, an agent identified as described herein 

kinase-binding protein aiKl a candidate compound are incu-. (e.g., a kinase-modulating agent, an antisense kiiiase oudeic 

bated in the kinase protein-presenting weUs and the amount add molecule, a kiiiase-specific antibody, or a kinase- 
of complex trapped in the well can be quantitated. Methods 45 bincfing partner) can be used in an animal or other tnodel to 

for detecting such complexes, in addition to those described determine the efiScacy, toxidty, or side efiBccts of treatment 

above for the GST-immd)ilized complexes, indude immu- with such an agent Altemadyely, an agent identified as 

nodetection of complexes using aiitibodies reactive with the described herein can be used in an animal or odier model to 

kinase protein target molecule, or livliich are reactive, with determine the mechanism of actbn of such an agpnt 
kinase protein and compete with the target molecule, as well 50 Furthermore, this invention pertains to uses of novel agents . 

as enzyme -linked assays which rely on detecting an eoizy- identified by the above-described screening assays for treat- . 

qiatic activity associated with the target . nients as described herein; . " : n : 

Agents that modulate one of the kinases of the present The kinase' .proteins of thie present invention are alsb 

invention can be identified using one or more of the above useful to provide a target for Hi'agnnfii'ng a disea^ or 

assays, alone or in combinatioa It is generally preferable to 55 predisposition to disease mediated by the peptide, 

use a cell-based or cell free system first and then confinn Accordingly, the invention provides methods for detecting . 

activity in an animal or other model system. Such model the presence, or levek o^ the protein (or encoding mRNA) 

systems are well known in the art and caii readily be in a cell, tissue, or oiganisnLExperiinental data as provided 

empbyed in this context. . ; .1 in HG.l indicate eiq^ression in faumiuis in tcratocaidnbm^^^ 

Modulators ofkinase protein activity identifii^ according 60 ovary, testis, nervous tissue, bladder; infant and feUd b^ 

to these drug screening assays can be used to treat a subject and thyroid gland. The method involves contacting a bib- 

with a disorder mediated by the kinase pathway, by treating logical sample with a compound enable of interacting with 

cells or tissues that express the kinase. Experimental data as the kinase protein such that the interaction can be detected, 

provided in FIG. 1 indicates oppression in humans in Sudi an assay can be provided in a single detection format 

teratocarcinoma, ovary, testis, nervous tissue, bladder, infant 65 or a multi-detecdon format such as an antibody chip ariray. 

and fetal brain, and thyroid gland. These methods of treat- One agent for detecting a protein in a sample' is an 

ment include the steps of administering a modulator of antibody capable of selectively binding to proteiiL A bio- 
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logical sample includes tissues, cells and biobgical fluids more or less active in substrate binding, and kinase activa-^ 

isolated from a subject, as well as tissues, cells and fluids tion. Accordingly, substrate dosage would necessarily be 

present within a subject. modified to maximize the therapeutic cfifect within a given 

The peptides of the present invention also provide targets population containing a polymorphism. As an alternative to 

for diagnosing active protein activity, disease, or predi^x)- 5 genotyping, specific polymorphic peptides could be identi- 

sition to disease, in a patient having a. variant peptide, ficd, ' , . ^ • : 

particularly activities and conditions that are known for ^ The peptides are also usefiil for treatiiig a disorder char- 

other members oftbe fainOy ofproteins to which tl^ present acterized by an absence, of, xnapprc^ri^ or unwanted 

one belongs. Thus, the peptide can be isolated from a eiqiression of the protein. Experimental data as provided in 

biological sample and assayed for the presence of a genetic ]o FIG. 1 indicates expression in humans in teratocardnoma, 

mutation that results in aberrant peptide. This includes ovary, testis, nervous tissue, bladder, infant arnl fetal brain, 

amino add substitution, deletion, insertion, rearrangement, ^d thyroid gland Aooordingly, methods for treatment 

(as the result of aberrant splicing events), and inappropriate indude the use of the kinase protein or fragments. , . 

post-translatibnal modification. Analytic methods include Aritibodies " ; . 0. 

altered clectrophoretic mobility, altered tryptic peptide 15 _ . . , ^ ^. , ' • ; 

digest. alteredldnaseactiWtyin<iU-based or aificT^ also provd^i antibodies that selective y 

aUcration in substrate or antibody-biodir^g pattern, altered bmd to erne of the peptides of the presem mw^^^ 

isoelectric point, direct amino add sequendng, and any ownpiismg sudi a pcpti^^ 

other of die known assay tedmiques useful for detecting f^^^^- ^y^^ herein.^ antibody selectively binds a 
mutations in a protein. Sudi an assay can be provided in a 20 ^ti^^P^^T^^ bmds the target pcpUde and does not 

single detection format or a multi-detcction frimiat sudi as ^^f^^V bind to unrekted protems. An antibody is stfll 

an antibody chip array considered to selectively bmd a peptide even if it also bmds 

In vitro techniques for detection of peptide include jo other proteins that are not suteantia^ 

enzyme linked imiliunosod>ent assays (ELKAs), Western ^J^^^jP^^^f ^ ^^ng as such proteu^ share homology 

blok immunoprcdpitations and immLifluoresance using 25 ^^^^^-^J^^jl^'^J^^^ "ff^ *lSf 

a detection reicntTsuch as an antibody or protein bindini "Ofcody In to 

agent. Alternatively, the peptide can be dpteLd in vivo in a J^J^^J^^'??" is stiU selective despite some degree 

subject by introducing into the subject a labeled anti-peptide ^ . ^' 

antibody or other types of detection agent. For example, the ^ »° ^^J^ m tenns oons^tent 

antibody can be labeled with a radioactive marker whose 30 reagnaed within the art: they are multi^umt 

presence and location in . a subject can be detected by pra««'?s.piod.«d by amammdun oiganism m re^nse to 

standard imaging techniques. Particularly usefiU are nieth- »nandgenchanenge.llie utiboAesof thepiesent^^ 

ods that detect the allelii variant bf a pq)tide expressed in a "ocbide polyclonal antibo^es and^oaoctonal utibodies, as 

subject and methods which detect fragment^ of a peptide in ^ . IT .^l^* 

a sample 35 ®' »\^72» *nd Fv fragments. 

The peptides are also useful in pharmacogenomic analy- . Manymethods arc known for gerierating antVor identify- 

sis. Pharmacogenomics deal with clinically significant jng^ntibodies to a pven to^^^ 

hereditary variSions in the response to drugs due loaltered ^J^J^'f^f.n^^ Antibodies, CoW Spring 

drug disposition and abnormal action in affected persons. Hartwr tress, (1989;. 

See, eg., Eichelbaum, M. (Oul JEjgx Pharmacol Physiol 40 ^ general, to generate antibodies, an isolated peptide is 

23(10-ll):983-985 (1996)), and Unde^ M. W. (Cluu Chem, ^ ^ *° umnunogpn and is administered to a mammalian 

43(2):254-266 (1997)). The clinical outcomes of these otgaiusm, such as a rat, rabbit or mouse. The fuU-lcngth 

variations result in severe toxicity of therapeutic drugs in protein, an antigenic peptide fragment or a fiision protein 

certain individuals or therapeutic feilurc of drugs in certain ^ *f . Particulariy important fragmente arc tlK>se 
individuals as a result of individual variation in metabolism. 45 covering functional domains, such as the domains identified 

TTius, the genotype of the individual can determine the way ^ 2, and domain of sequence homology or divergence 

a therapeutic compound acts on the body or the way the amongst tiie family, sudi as tiiose that can readily be 

body metabolizes the compound. Further, the activity of identified using protein aligament methods and as presented 

drug metabolizing enzymes effects both the intensity and F»g^«s. : x - ; 

duration of drug action. Thus, the pharmacogenomics. of the 50 Antibodies arc preferably prepared from regions or dis- 

individual permit the selection of effective compounds and fete fragments of the kiiiase proteins. Antibodies can :|>e 

effective dosages of such coiUpounds for prophj^actic or prepared from any regipn of the peptide as described herein, 

tiicrapeutic treaUnent based on the individual's genotype. However, preferred regjons will indude those involved in 

The discovery of genetic polymorphisms in some drug functiori/activity and^or kinase/binding partna interaction. 

ineUbolizing enzymes has explained why some patients do 55 FIG, 2 can be used to identify particularly important regions 

not obtain the expected drug effects, show an exaggerated while sequence alignment can be used to identify conserved 

drug effect, or experience serious toxidty from standard ^ npiquc isequence fragments. . - ■ 

drug dosages. Polymorphisms can be expressed in the phe- An antigenic fragment will typically comprise at least 8 

notype of the extensive metabolizer and the phenotype of the contiguous amino add residues. The antigenic peptide can 

poor metabolizer. Accordingly, genetic polymorphism may 60 comprise, however, at least 10, 12, 14, 16 or more amino 

lead to allelic protein variants of the kinase protein in which acid residues. Such fragments can be selected <m a physical 

one or more of the kinase functions in one population is pniperty, such as fragments correspond to regions that are 

different from those in another populatioiL The peptides thus located on the ^irfaoe of the protein, e.g., hydrophiHc 

allow a target to ascertain a genetic predispos^n that ^n regions or can be selected based on sequerxx uniqueness 

affect treatment modality. Thus, in a ligand4)^ased treatment, 65 (see FIG. 2). 

polymorphism may give rise to amino terminal extracellular Detection on an antibody of the present invention can be 

domains and/or other substrate-binding regions that are facilitated by cotqiling (ix., physically linking) the antibody 
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Lh^Jf^'"'^'^*- Ewniplesof detcctablesubstanoes proteins can be used to identify individuals that fequirc 

S!l,T ^"T't modalitieslSe antibodies «c al^ 

^^r^ 1""^ bujluinmesceat materials. M as diagnostic tools as «, immnnolpgical baiter for 

and radioactive matenab. Examples of suitable enzymes aberrant protein analyzed by electropbo^ m^fficJ i». 

. r i ' "fT"^'"'' P*fi»?.««' phosphatase. 5 electric point, tiyptic^tide dig«t i^d oE p?4^ 

.. p^jalactosidase. or acetylcholinesterase; exaiiq)les of suit- . . assays known to Aose inite art^ F'?!^"'. 

' able prosthetic group complexes indude streptevidiiiA>iotin ti.. i-ta^:- ' • n.^- . ' ' ■- 

•include umbcllifcrone, fluorescein, fluoresc^ ™- ^ "'^^ cicpressiou in 

isothiocyanate. rhodamine. dichlorotriazinylamine ,o iTJ^r ZT'. "^ 

fluoresceii^dansylchlorideorphycoerythrinianci^^^ ^f?' "^V'l^ ^L"^ ^-y^id gland. Tims, 

a luminescent material inchdcs li^S «^plcs^b^^^ ^f .L^r ^"^"^"^ ^""^"^ ^ ^^'^^ 

minescent materials include luciferase. ludferin. «d l^St^'. T ^ 

aequorin, and examples of suitable radioactive material ^^"'/^^be used to identify a tissue type. : . 

include "^I, or 'H. 15 ^® antibodies are also useful for inhibiting protein 

. nl J .r fiuw^n. for example, bloddpg the binding of the Id^ 

Antibody Uses . . pq>tidc to a binding partner such as a substrate. These uses 

The antibodies can be used to isolate one of the proteins *^ *^ ^ applied in a therapeutic context in which 

of the present invention by standard techniques, such as treatment involves inhibiting the protein's function. An 

afliiiily chromatography or immuhopredpitation. The anti- ^ antibody can be used, for example, to block binding, thus 

bodies can facilitate the purification of the natural protein modulating (agonizing or antagonizing) the peptides activ- 

from cells and recombinantly produced protein expressed in ^^V- Antibodies can be prepared against speci&c fragments 

host cells. In addition, such antibodies are useful to delect containing sites required for fiinction or against intact pro- 

thc presence of one of the proteins of the present invention ^ assodated.with a cell or cell membrane. Sec HG, 

in cells or tissues to detennine the pattern of expression of ^ structural information relating to the proteins of the 

the protein among various tissues in an oiganism and over present invention. . : .r i; , : ' 

the course of normal development^ Experimental data as The invention also encompasses kits for using ^tibodies 

provided in HG. 1 indicates that the kinase proteins of die to detect the presence of a protein in a bi)lpgicai sample 

present invention are iexpressed in humans in The kit can comprise antibodies such as a labeled or labcl- 
teratocarcmoma,ovaiy, testis, nervous tissue, bladder, infant 3^ able antibody and a compound or agent for detecting protein 

bram, and thyroid gland, as indicated by virtual northern blot in a biological sample; means for determining the amoimt of 

^alysis. . In addition, PCR-based tissue screening panels protein in the sample; means for comparing the imbunt of 

mdicate expression in fcUl brain. Further, such antibodies protein in the sample with a standard; and instructions for 

can be used to detect protein in situ, in vitro, or in a ceU use. Such a kit can be supplied to detect a single proteiii or 
lysate or supernatant in order to evaluate the abundance and 35 epitope or can be configured to detect one of a multitude of 

pattern of expression. Also, such antibodies can be used to epitopes, such as in an antibody detection array. Arrays are 

assess abnormal tissue distrit>ution or abnormal expression described in detafl below for nulcic acid arrays and simihu- 

during development or progression of a biological condition. methods have been developed for antibody arrays V 

Antibody detection of circulating fragments of ^the full - : 

length protein can be used to identify turnover. 40 Nucleic Add Molecules ; , . 

. Further, the antibodies can be used to assess ejqprcssbn in The present invention fiirther provides isolated nuddc 

disease states such as m active stages of the disease or in an add molecules that encode a kinase peptide or brotcin of the 

mdividual with a predisposition toward disease related to the present invention (cDNA, transcrq)t and genomic sequence) 

protcm s funchon. When a disorder is caused by an inap- Such nucleic add molecules will consist of, conSt cssen- 

propnate tissue distribution, developmental expression, 45 tiallyot or comprise a nudcotide sequence that encc^ one 

tevel of cjqpr^on of the protdn. or expressed/iirooesscd of the kinase peptides of the present invention, an aUeUc 

form, the antibody can be prepared against the normal variant thereof or an ortholog or paralog thercot - 

*f P''^'?^ ^ * Asusedherein,an«isohted"micleicacidmoIeculeis6w 

ah^d^HW^ ^' mfant axid f^tal brain and thyroid so naturd source of the nucleic add Preferably, an -plated" • 
Sntlin nucleic add is free offences which nat^aflyflank^^^^ 

u^dZ .^^^^ nuckic add CI.C., sequences located at the 5- and^^^^^ 

'^J^'^'^y^^^^^ the nuciek add) intL genomic DNAof the organ^ 

Jhe antibodies can also be used to assess normal and whidi the nucleic add is derived. However. Acre can be 
^rranl subceUular locahzation of cells in. the various 55 some flanking nucleotide sequences, for example up to 
tou« in an orgaman. Dqperfmentil data as provided in about 5KB, 4KB, 3KB. 2KB. or 1KB or less, particularly 
MG. 1 mAcates expression m humans in tcratocardnpma, contiguous peptide encoding sequences and peptide eiool 
ovary test^ nervoia tissue, bladder, infiint and fetal brain, ingscquenccs within die same^cne but separk^ by idroi 
and thyroid gland. The diagnostic uses can be applied, not in the genomic ^sequence. The important point is^tiii the 
onfy in genetic testing, but also in monitoring a treatment 50 nucleic add is isolated from remote and unimportent flank- 
modality. Accordingly, where treatment is ultimately aimed ing sequences sudi that it can be subjected tbthe 5>cdfic 
at correclmg ejq)ression level or the presence of aberrant manipulations described herein such as recbr^anl 
sequence and ^^it tissue distribution or devetopmental c^qpression, preparation of probes and primeis, and other 
expresaon, antibodies directed against the protein or rd- uses specific to the nudeic add sequences. ' * - • ' 
'TL n can be used to monitor therapeutic efficacy. . « .Moreover, an "isolated" nudeic add molecule, ^such as a 

Additionally, antibodies are useful in pharmacogenomic transoqit/cDNA molecule, can be substantially free of other 
analysis. Thus, anUT)odies prepared against polymorphic cellular material, or culture medium when produced by 



us 6,340^83 Bl 

21 22 . 

recombinant techmques, or diemical precursors or other a protein from precursor to a mature form. fcciBute protein 

chemicals when <*emicaUy synthesized. However, the trafficking, prolong or shorten protein half-Ufe or fecQitate 

nndeic acid molecule can be fiised to other coding or manipulation of a prolem for assay or proAirtion^ amoi« 

n^toor sequences and still be considered isolated. other Ihmgs. As generaUy b the case m s^j. the addiUonrf 

icguiawi jr owjutuvw* ati« amino acids may be processed away from the mature protcm 

For example. reqombinanlDNAmolcculcscontamed ma 5 by ceUular erizymes. 

vector are. coiisideredisoktedvFurtlier examples of ^ mcnUoned above, the i^hted mickic add molecules 

DNAhwlcculesinctoderecombmaiitpKAmolecu^ include, but are not limited to, tbe sequence eocoding the : 

tained in heterolo^us host cells or purified (parUally or peptide alone, the sequence encoding the . mature 

substantially) DNA molecules in solution. Isolated RNA pq,tide and additional codipg sequences, such as a leader or 

molecules include in vivo or in vitro RNA transcripts of the ^^^^^^ sequence (e.g., a pre-pro or pro-protein sequence), 

isolated DNA molecules of the present invention. Isolated sequence encoding the mature peptide, with pr without 

nucleic add molecules according to the present invention additional coding sequences, plus additional non-coding 

fiirther inchide such molecules produced synthetically. sequences, for example introns and non-oodir^g 5' arid 3' 

Acoordiii^y, the present invention provides nucleic add sequences such as transcribed but non-translated sequences 

molecules that consist of the nucleotide sequence shown in that play a role in transcription, mRNA processing 

FIG, 1 or3 (SEO ID N0:1, transcript sequence and SEQ ID (including splicing and polyadenylation signals), ribosome 

N0:3, genomic sequence), or any nucleic acid molecule that binding and stability of mRNA In addition, the nucleic.add 

encodes the protein provided in FIG. 2, SEQ ID lioa. A molecule may be fused to a madccr sequence oipoding, for 

nucleic add molecule consists of a nucleotide sequence ^ example, a peptide that fadliutes purificatioiL ... ... 

when the nucleotide sequence is the complete riuclcotide Isolated nucleic add molecules can be in the form of 

sequence of the nucleic add rnolecule, RNA, such as mRNA, or in the form DNA, induding cDNA 

The present invention fiirther provides nucleic add mol- and genomic DNA obtained by cloning or produced by 
ccules that consist essentially, of the imcleotide sequence chemical synthetic techniques or by a combination thereof, 
shown in FIG. lor 3 (SEQ ID riO:l,transcr^)t sequence and ^ The nucleic add, especially DNA, can be double-strplwl or 
SEQ ID NO:3, genomic sequence); or any nucleic add single-branded. Single-stranded nudeic acid can . be . the 
molecule that encodes the protein provided in FIG. 2, SEQ coding strand (sense strand) or the non-coding strand (anti- 
ID N0:2, A nuddc add molecule consists essentially pf a sense strand). 

nudeotide sequence when such a nucleotide sequence is The invention fiirther provides nucleic add molecules that 

present with only a few additional nucleic add residues in ^ encode fragments of the peptides of the present inycntioh as 

the final nudeic acid molecule. well as nucleic acid inolecules that encode obvious variants 

The present invention fiirther provides nucleic add mol- ; of the kiriase proteins of the present invention that axe 
ccules that comprise the nucleotide sequences shown in FIG, described above. Such nucleic add molecules may be natu- 
1 or 3 (SEQ ID N0:1, transcript sequence and SEQ ID rally occurring, such as allelic variants (same locus), para- 
N0:3, genomic sequence), or any nucleic acid molecule that 35 bgs (different locus), and ortholc^ (different prganisrn), or 
encodes the protein provided in FIG. 2, SEQ ID NO:2. A may be constructed by recombinant DNA metho^ or by 
nucleic add molecule comprises a nucleotide sequence chemical synthesis. Such non-naturally occunirjg variants 
when the nucleotide sequence is at least part of the final may be made by mutagenesis techniques, induding those 
nudeotide sequence of the nudeic add molecule. In such a applied to nucleic add molecules, cells, or organisms, 
fashion, the nucleic acid molecule can be only the nucleotide 40 Accordingly, as discussed above, the variants can contain 
sequence or have additional nucleic add residues, such as nucleotide substitutions, deletions, inversions and inser- 
nudeic add residues that are naniraUy assodaled with it or tions. Variation can occur in either or both the coding and 
heterologous nudeotide sequences. Such a nucleic add non-coding regions. The variations can produce botti con- 
molecule can have a few additional nudeotides or can servative and non-conservative amino add substitutions., 
comprises several hundred or more additional nudeotides. A 45 The present invention fiirther provides non-coding frag- 
brief dcscrq)tion of how various types of these nucleic add nients of the nucleic add molecules provided in FIGS. 1 and 
molecules can be readily madc^latcd is provided below. 3. Preferred non-coding fragments include, but are not 

In FIGS. 1 and 3, both coding and non-a)ding sequences limited to, promoter sequences, enhancer sequences, gene 

arcprovidedBecauseofthesourceofthepresentinvention, modulating sequences and gene termination sequences, 

humans genomic sequence (HG. 3) and cDNA/transcript 50 Such fragments are usefiil in a)ntn)ning betcroloj^us gene 

sequences (FIG. 1), the nucleic add molecules in the Figures expression and in devetopirig screens to identify ^ene- 

wiU contain genomic iatronic sequences, S and 3* iwn- modulating agents.. A promoter can rcadfly be identified as 

coding sequences, gene regulatory regions and non-coding being 5* to the ATG start site in the gcnormc scquwice 

intergenic sequences. In general sudi sequence features are provided in FIG. 3. . , .... . 

cither noted in FIGS. 1 and 3 or can readily be identified 55 A fragment comprises a contiguous nucleotide sequence 

usixjg computational tools known in the art As discussed greater than 12 or more nucleotides. Further, a fcagpient 

below, some of the non-coding regions, particularly gene could at least 30, 40, 50, 100, 250 or 500.nuclcoti^ m 

regulatory elements such as promoters, are useful for a length. The length of the fragment ydll be based^on its 

variety of purposes, e.g. control of heterologous gene intended use. For example, the fragmeiil can ciicode epitope 

expression, target far ideritifying gene activity modulating go bearing regions of the peptide, or can be u«M 'as DNA 

compounds, and are particularly claimed as fragments of the probes and primers. Sudi fragments can be isolated uang 

genomic sequence provided herisin. the known micleotide sequence to synthesize an oligdniicle- 

Tbe isolated nucleic add molecules can encode the olide probe. A labeled probe can then be used to screen a 

mature protein phis additional amino or carboxyl-terminal cDNA library, genomic DNA Hbrary, or mRNA .to isolate 

amino acids, or amino adds interior to the mature pepUde 65 nucleic add corre^nding to the codmg region. Further, 

(when the mature form has more than one peptide chain, for primers can be used in PGR reactions to done ^peafic 

instance). Such sequences may play a role in processing of regions of gene. 
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A probe^rimer typically comprises substantuUy a puri- The mideic add molecules are also useful for oonstiud:- 

fied oUgomicleotide or oligonucleotide pair. The ol^oaucle- ing recombinant vectors. Such vectors iadude expression 

otide typically comprises a region of nucleotide sequence vectors that express a portion o^ pr all of, the peptide 

that hybridizes under stringent conditions to at least about sequences. Vectors also include insertion vectors; used to 

. 12, 20, 25, 40, 50 or more consecutive nucleotides. 5 integrate into another oudeic acid molecule sequence, such 

Oithologs,homologs, and allcUc variants can be identified : as into the ccUular genome, to ialteir in situ, expression of a 

. I using methods weU known in the art As described in the , = g«?c "w^/of gene ;product For example, an endogenous 

Peptide Section, these variants comprise a nudcotide coding sequence can be replaced via homiologous recombi- 

sequcnce encoding a peptide that is typically 60-70%, nation with all or part of tte coding region containing one or 

70-80%, 8(MJ0%, and more typicafly at least about 90-95% lO more specifically introduced mutations, 

or more homologous to the nudeotide sequence shown in The nucleic acid molecules are also useful for expressing 

the Figure sheets or a fragment of this sequence. Such antigenic portions of the proteins. ' . - 

nudeic add molecules can readily be identified as being The nucleic acid molecules are also useful as probes for 

able to hybridize under moderate to stringent conditions, to determining the dutjmosomal pcsitioiS of the nucleic acid 

the nudeotide sequence shown in the Figure sheets or a molecules by means of in situ hybridization methods. The 

fragment of the sequence. Allelic variants can readily be geae encoding the novd kinase .protein of the present 

determined by genetic locus of the encoding gene. The gpnc invention is located on a genome conq>onent that has been 

encoding die novel kinase protein of the present invention is mapped to human chromosome 22 (as indicated in FIQ, 3), 

located on a genome component that has been mapped to which is supported by mult^le lines of evidence, such as 

human chromosome 22 (as indicated in FIG. 3), which is 20 BAC msp data. ' ~ : ' 

si^ported by multiple lines of evidence, such as STS and ^^^^^ ^^^j molecule are also usefol in making 

BAC miap data. vectors containing the gene rejg;ulalory regions of the nucleic 

FIG. 3 provides infonnation on SNPs that have been add molecules of the present inyenlioni 

found in the gene encoding the kinase protein of the present ^^^^j^ ^^^^j molecules are also useful for designirig 

invcntfon. SNPs were identified at 42 different nudeotide rfbozymes coriespondiiig to all. or a part, of the mRNA 

poation^ of these SNPs, whidi are located outside the fo>m the nucleic add molecules described herein. 

ORE and m mtrons, may affect gene transcnption. _ , , , t i <. i * ' t • 

. , . r. The nucleic acid molecules are also usefiil for making 

M used herein, the term "hybridizes under stringent ^^^^ ^ ^ ^ 

conditions" is mtemled to describe conditions for hybrid- v% ^ i • -j t i V jl t ^ 

ization and washing under which nucleotide sequences , T^cnucleic aad molecules are also use^^^^^ 

encoding a peptide at least 60-70% homotogous to . mg ho^^ cells expre^mg a part, or all. of the micleic 

other typicaUy remain hybridized to each other. Tlie condi- molecules ana pepuaes. 

tions can be such that sequences at least about 60%, at least '"ic nucleic add molecules are ako uscfiil for construct- 
about 70%, or at least about 80% or more homologous to i°S transgenic animals expressing all, or a part, of the 
each other lypicaUy remain hybridized to each other. Such nucleic add molecules and p^tides. 
stringent conditions are known to those skilled in the art and The nucleic add molecules are 'also useful as h^ridiza- 
can be found in Current Protocob in Molecular Biology, tion probes for determining the presence, level, form and 
John Wiley & Sons, N.Y (1989), 63.1-63.6. One example distribution of nucleic add eiqpressionl Experiinental data as 
of stringent hybridJ^tion conditions are hybridization in 6x ^ provided in FIG. 1 indicates that the kinase proteins of the 
sodium chloride/sodium citrate (SSC) at about 45C, fol- present invention are expressed in humans in 
bwed by one or more washes in 0.2xSSC, 0.1% SDS at teratocarcinoma, ovary, testis,lQervdus tissue, bladder, infant 
50-65C. Examples of moderate to low stringency hybrid- brain, and thyroid gland, as indicated by virtual northern bbt 
ization conditions are well known in the-art analysis. In addition, PCR-based tissue screening panels 

. indicate expression in fetal braiiL Accordingly, the pn4>es 

Nucleic Add Molecule Uses can be used to detect the presence o^ or to determine tevels 

The nucleic add naolecules of the present invention are * specific nudeic acid molecule in ccDs, tissues, aixl in 
useful for probes, primers, diemical intermediates, and in organisms. The nucleic add whose level is determined can 
biological assays. The nudeic add molecules arc useful as he DNAor RNA Accordingly, probes corresponding to the 
a hybridization probe for messenger RNA, transcript/cDNA 50 peptides described herein can be used to assess ej9)ression 
and genomic DNA to isolate full-length cDNA and genomic and/or gene copy mirhber in a given cell, tissue, or organism, 
ctones 'encoding the peptide descn*bed in FIG. 2 and to These uses are relevant for diagnosis of disordctsm^ 
isolate cDNA and genomic ctones that correspond to vari- an increase or decrease in kinase protein expression relative 
ants (alleles, orthologs, etc.) producing the same or related to normal results, - . 
peptides shown in FIG. 2. As illustrated in FIG. 3, SNPs 55 In vitro techniques fordetection of mRNA include North- 
were identified at 42 different nucleotide positions. em hybridizations and in sitii hybridizations. In vitro tech- 

The probe can correspond to any sequence along the niques for detecting DNA includes Southern hybridizations 

entire length of the nucleic add molecules provided in the and in situ hybridization. : * 

Figures. Accordingly, it could be derived torn 5* noncoding Probes can be used ^ a part of a diagnostic t^ kit for 
regions, the coding region, and 3* noncoding regions. 60 identifyii^celk or tissues that e3q)re^ a kiiiaseprbteir^ 

However, as discussed, fragments are not to be construed as as by measuring a level of a kinase^bbodii^ nudeic add in 

encompassing fragments disclosed prior to the present a sample of cells from a subject, e.g.. mRNA or genomic 

inventioiL DNA, or determining if a kinase gene has been mutated. 

The nucleic acid molecules are also useful as primers for Experimental data as provided in FIG. 1 iiidicates that the 
PGR to amplify any given region of a nucleic acid molecule 65 kinase proteins of the present invention are expresised in 

and are useful to synthesize antiseose molecules of desired humans in teratocarcinoma, ovary, testis, nervous tissue, 

length and sequence. bladder, infant brain, and thyroid gland, as indicated by 
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virtual DOithcrn blot analysis. In addition, PCR-based tissue The nucleic add molecules arc also useful for momtonng 
screening panels indicate expression in fetal brain. the effectiveness of modulating compounds on the exprcs- 

Nuckic acid expression assays are useful for drug screen- sion or actiWty of the kinase gene in clinical trials or in a 
inss to identify compounds that modulate kinase nucleic acid treatment regmien. Thus, the gene expressioi pattern can 
cjmrcssion 5 serve as a barometer tor , the contmumg effectiveness of 

\c ir^vcntion thus provides a method for identifying a treatment with the comjKxund. paraculariy^^ 
compound that can be used to. treat a disorder associated to v^dx a paUent can develop rcastanoc Tbc gene, exprcs-. 
vdth nudeic add expression of the kinase gene, particularly sion pattern can also serve os a marker . indicatiye of a 
biological and patholo^cal processes that arc mediated by physiological response of the affected cells to the compound, 
the kinase in cells and tissues that express it, Ejqpcrimcntal iq Accordingly, such monitoring would allow either inaeased 
data as provided in FIG. 1 indicates expression in humans in administration of the oompound or the administration of 
teratocarcinoma, ovary, testis, nervous tissue, bladder, irifant altemiative compounds to which the patient has not become 
and fetal brain, and thyroid gland. The method typically resistant. Similarly, if the level of nucleic acid . expression 
indudes assayiixg the ability of the cornpoimd to modulate falls bebw^'a dearable level, administration of the com- 
the expression of the kinase nucleic add and thus identifying pound could be commensurately decrcased. : . .. . . , 
a compound that can be used to treat a disorder characterized j^e nucleic add molecules are also useful in diagnostic 
by uisdesiicd kinase nuckic add expression. Ihe assays can assays for qualitative changes in kinase; nucleic acid 
be performed in cell-based aiiid ccU-frce systems. Cell-based expression, and particularly in qualitative changes that lead 
assays inchide cells naturally expressing the kinase nudeic {q pathology. The nudeic add molecules caii'be .used p 
add or recombinant cells genetically engineered to express ^ detect mutations in kinase genes and gene expression prod- 
specific nucleic add sequences. ucts such as mRNA. The nucleic add molecules can be used 

The assay for kinase nucleic acid esqpression can involve as hybridization probes to detect naturally ocqurring genetic 

direct assay of nucleic acid levels, such as mRNA levels, or mutations in the kinase gene and thereby to . determine 

on collateral compounds' involved in the signal pathway. whether a subject with the mutaiioii is at ri^ foir a disorder 

Further, the expression of genes that are up- or down- js caused by the mutation. Mutations include deletion, 

regulated in response to the kinase protein signal pathway addition, or substitution of one or more nucleoddes in the 

can also be assayed. In this embodiment the regulatory gene, <^m6somal rearrangement, sudi as inversibri .or 

regions of these genes can be operably linked to a reporter transposition, modification of genomic DNA, such as aber- 

genc such as tuciferase. rant methylation patterns or dianges in gene c^y number. 

Thus, modulators of kinase gene expression can be iden- 30 such as amplification. Detection of a mutated form, of the 

tified in a method wherein a cell is contacted with a kinase gpne assodMed with a dysfimction provides a diag- 

candidate compound and the expression of mRNA deter- nostic tool for an active disease or suso^tibility to disease 

mined. The level of expression of kinase mRNA in the when the disease results frotb bvcrcxprcssion, 

presence of the candidate compound is compared to the level undercxpression, or altered expression of a kinase protein, 
of expression of kinase mRNA in the absence of the candi- 35 Individuals carrying mutations in the kinase gene can be 

date compound. The candidate compound can then be iden- delected at the nucleic add level by a variety of tcchnKjucs. 

tified as a modulator of nucleic add «qpressioo based on this piG. 3 provides information on SNPs that have been fotind 

comparison and be used, for example to treat a disorder in the gene encoding the kinase protein of the.;,present 

characterized by aberrant nucleic add expression. When invention. SNPs were identified at 42 different niideotide 

e;q)ression of mRNA is sUtistically significantly greater in 40 positions. Some ofthese SNPs, which are tocatcd outside the 

the presence of the candidate compound than in its absence, qrf and in introns, may affect gene transcription. Tbp gene 

the candidate compound is identified as a stimulator of encoding the novel kixiase protein of the present iiivention is 

nucleic acid eq}ression. When nucleic acid expression is located on a genome component that has been mapped to 

statistically significantly less in the presence of the candidate human chromosome 22 (as indicated in HO. 3), Avfaich is 

compound than in its absence, the candidate compound is 45 supported by multiple lines of evidence, such as STS and 

identified as an inhibitor of nucleic acid expression. BAG map data. Genomic DNA can be analyzed directly or 

The invention further provides methods of treatment, with can be amplified by using PGR prior to analj^is: I^A or 

the nucleic acid as a target, using a compotind identified cDNAcan be used in the same way. In wme , uses, detection 

dirough drug screening as a gene modulator to modulate of the muution involves the use of a probc/primcr ;in a 

kinase nudeic add expression in cells and tissues that 50 polymerase chain reaction (PGR) (see, e.g. U^. P4L' Nos. 

express the kinase. Ejqwrimental data as provided in RG. 1 4,683,195 and 4,683,202). such as anchor PGR or RACE 

indicates that the kinase proteins of the presisnt invent PGR, or, alternatively, in a . ligation chain reaction (LC31) 

erorcsscd in humans in teratocarcinoma, ovary, testis, ner- (see, e.g., Landegran ct al.. Science 241:1077-1080 (1988); 

vous tissue, bladder, infant brain, and thyroid gland, as and Nakazawactal.,PJVilS 91:360-364 (1994)), the lat^^ 

indicated by virtual oorthem blot analysis. In addition, 55 which can be particulariy useful for detecting point miiu- 

PGR-based tissue screening panels indicate expression in tions in the gene (see Abravaya et aL, Nucleic Adds Res, 

fetal brain. Modulation includes both up-wguiation (i.e. 23:675-682 (1995)). This method can indude the ,^cps of 

activatwn or agonization) or down-regulation (suppression coUeding a sample of cells from a patient, isolating nudeic 

or antagonization) or nucleic add expresaon.' add (e.g., genomic, mRNA or both) torn the c^Ds of the 

Alternatively, a modulator for kinase nucleic acid expres- 60 sample, conUcting the nuddc add sample with one or more 
sion can be a small molecule or drug identified usipg the primers which specifically hybridize to a gpne^^der con- 
screening assays described herein as long as the dnig or ditions such that hybridization and amplification^ of the ^ne 
small molecule inhibits the kinase nucleic add e;q»tession in (if present) occurs, and detecting the presence or absaicc of 
the cells and tissues &at express the protein.' Ejqwrimenlal an amplification product, or detecting the size of the ampH- 
data as provided in FIG. 1 indicates expression in humans in 65 ficaUon product and con:q)aring the length to a control 
teratocarcinoma, ovary, testis, nervous tissue, bladder, infant sample. Deletions and insertions can be detected by a change 
and fetal brain, and thyroid gland. in size of the amplified product compared tojthe normal 
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genotype. Point mutations can be identified by hybridizing involved in transcription, preventing transcription and hence 

amplified DNA to nonnal RNA or antisense DMA production of kinase protein. An antisense RNA or DNA 

sequences. nucleic add molecule would hybridize to the mRNA and 

. Alternatively, mutations in a kinase gene can be directly thus block translation of mRNA into kinase protein, 

identified, for example, by alterations in restiiction enzyme 5 Alternatively, a class of antiscnise molecules can be used 

digestion patterns deteirmined by gel ;ele<^ropbor^. to inactivate mRNA in order to decrease eiqrrc^on of 

Further, sequence-specific ribozymes (VS, Pat." No. • kinase nucleic acid. Accordingly, these molecules can treat . 

5,498431) can be used to score for the presence of specific a disorder characterized by abnbrnial W imdesired kinase 

mutations by development or loss of a ribozyme cleavage nucleic add e3q)ression. This tedmique involves cleavage 

site. Perfectly matched sequences can be distingui^ed fiom by means of ribozymes containing nucleotide sequences 

mismatched sequences by nuclease cleavage . digestion complementary to one or more regions in the mRNA that 

assays or by dififcrcnccs in melting temperature. attenuate the ability of the mRNA to be translated. Poss&le 

Sequence changes at Specific locations can also be regions include coding regions imd partiailarly coding 

assessed by nudease protection assays such as RNase and regions correspondipg to the catalytic and other functional 

SI protection or the chemical cleavage : method. activities of the. kinase protein, su<A as su^ 

Furthermore, sequence differences between a mutant kinase . The nucleic acid molecules also provide vectors for. gene 

gene and a wild-type gene can be determined by direct DNA therapy in jpatients containing cells that are abciimuit in 

sequencing. A variety of automated sequencing procedures kinase -gene expression. Thijs, recombinant cells, wfaicfa 

can be utilized when performing the diagnostic assays inclucte tiK patient's cells that have been er^eered ez vivo 
(Nacvc, C. W,, (1995) Biotecbniques 19:448), inchiding ^ and returned to to patient, arc introduced into an individ^ 

sequencing by mass spectrometry (see, e.g., PCT Interna- where the cells produce the desired kinase protein to treat the 

tional Publication No. WO 94/16101; Cohen ct aL; A^/v. individual. : r 

Chromatogr. 36:127-162 (1996); and Griflan et al., Appl The invention also encompasses kits for detecting the 
Biochem. BiotechnoL 38:147-t159 (1993)). ^ presence of a kinase nucleic add in a biotogical sample. 

, Other methods for detecting mutations in the gene include . Experimental data as provided in FIG. 1 indicates ihat the 

methods in which protection from deayage agents is used to kinase proteins of the present invention are expressed in 

detect mismatched > bases in RNA/RNA or RNA/DNA humans in teratocarcinoma, ovary, testis, nervous tis^e, 

duplexes(MyersetaL,5ci€m:e230:1242 (1985)); Cotton et bladder, infant brain, and thyroid gland, as indicated by 
aL, PNAS 85:4397 (1988); Saleeba et aL, Meth, Emymol. 21 3^ virtual northern blot analysis. In addition, PCR-based tissue 

7:286-295 (1992)), electrophoretic mobility of mutant and screeniiig panels indicate expression in fetal brain. For 

wild type, nudeic acid is compared . (Orita et aL, - ^^^^5 example, the kit can comprise reagents such as a labeled or 

86:2766 (1989); CoUon et al., MutaL /?cs: 285:125-144 . . labelable nucleic add or agent capable of detecting kinase 

(1993); and Hayashi et aL, GeneL AnaL Tech. A/^L 9:73-79 nucleic add in a biological sample; means for determining 

(1992)), and movement of mutant or wild-type fragments in the amount of kinase nucleic acid in the sample; and nieans 

polyacrylamide gels containing a gradient of denadirant is for comparing the aniount of kinase nucleic add in the 

assayed using denaturing gradient gel electrophoresis sample with a standard. The compoimd or agent can be 

(Myers et al.. Nature 313:495 (1985)). Exainples of other packaged in a suitable container: The kit can further com- 

tecfaniqucs for detecting point mutations include , selective prise instructions for using the kit to detect kinase protein 

oligonucleotide hybridization, selective ainplification, and ^ nRNAprDNA. - ; 

selective primer extension. > / . 

The nucleic add molecules arc also useful for testing an Nudeic Add Arrays . 

individual for a genotype diat ^lile not necessarily causing The preserit invention fiirther provides nucleic acid detec- 

die disease, nevertheless affects the treatment modality. tion kits, such as arrays or microaaays of nucleic .^d 
Thus, the nucleic add molecules can be used to study the 45 inolecules that are based on the sequence, iiiformatidh pro- 

relationshq) between an individud's genotype and the indi- vidcd in FIGS. 1 and 3 (SEQ ID NOS:l and 3). : ? 

vidual's response to a compound used . for treatrnent As used herein ''Arrays" or **Microarrays" refefi to' an 

(pharmaoogenomic relationsh^). Accordingly, the nudeic array of distinct polynucleotides or oligonucleotides syiithe- 

add molecules described herein can be . used to assess the sized on a substrate, such as paper, nylon or other type of 

mutotion content of tfie kinase gene in an individual in order 50 incmbrane, filter chip, gjass slide, or any other suitable sob'd 

to select an. appropriate compound or doisage regimen for si^port In one embodiment,' the miwoarray is prepared^ ' 

treatincnL FIG; 3 provides information on SNEs tliat have . used .According to the methods described in U^. Pat No. 

been found in the jgenc encoding the kinase protein of the 5,837,832, Chee et aL, PCT application W095A1995 (Chce 

present invention. SNPs were identified at 42 different ct aL), Loddiart, D. J. et aL (1996; Wot Biotedi: ^14: 

nudeotide positions. Some of these SNPs, which are located 55 1675--1680) and Schena, M. et aL (1996; /Vw. JVflf£ Acad^ 

outside the ORF and in introns, may affect gjcnc transcrip- Sd. 93; 10614-10619), all of which arc incorporated herein 

" . . . r ' .; in thek entirety by reference. In other embodiments?'^ 

Thus nucleic acid molecules displaying genetic variations anays are producol by the niethods described by Brown ct 

that aflGMK treatment provide a diagno^ target that can be aL, U-S. PaL No. 5,807,522. - • v ^ r-^* 

used to tailor treatment in an individuaL Acoordir^y~ the 60 The microarray or detection kit is preferably comi)oscd of 

production of recombinant cells and animals contairiing a large number of unique, single-stranded nudeic' add 

these polymorphisms allow effedive clinical design of treat- sequences, usually either synthetic aridscnse bligonude- 

mcnt compounds and dosage regimens. - . \ V: otidcs or fragments of cDNAs, fixed to a solid support The 

Ibe nucleic add molecules are thus usehil as antisense oligbnucleotides are preferably about 6-60 nucleotides in 

constnicts to control kinase gene expression in cells, tissues, 65 length, more preferably 15-30 nucleotides in length, and 

and organisms. A DNA antisense nucleic acid molecule is most preferably about 20-25 nudeotides in length.' For a 

deseed to be complementary to a regiori of the gene certain type of 'microarray or detection kit, it may be 
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preferable to use oligonucleotides that are only 7-20 nude- Using su<J arrays, the peswit invwnwn provrf« me^^^ 

laL in length. He miaoarray or delecfion kit may contain ods to identify U»e e^rcssjon of the ^P'^^^^^^ 

oligpnucfcotdes that cover tte known 5'. or 3', sequence. of the present mvenUon In detail. «««^«^JW? 

Sitial oligonucleotides whid. cover the fiill length mcubatmg a test sample with one or more nudoc a^ 

sejuence; or udqueoUgpnucleotides selected fiomparticu- 5 niokcd« assaying for ^^.fj^^^}?^ 

^a^MS along thb fcnrth of the seqiieocc. Polynucleotides molecule with components witfim the te^ sample, Sudi 

S^thTKirayordetecUo^miybeoligomide. assays wiUtypicaUy involve aira««mpn^^^ 

us«i in Hie miaowAy v» u<i»vuuu ^ 1, j-:;^^ . at least one of which is a gene of ttie present mvention aod 

otides that are specific to a gene or genes of mterest ^^^^^ a,e kiiiase gene of the p^nt inventioo. HO. 

bl order to produce oUgonucleoudes to a toown sequence ^ ^^^^ infotmition on SNPs that have been famd in the 

for a microarray or detection kit, the geae(s) of mterest (or jo ' ^^^^^^^ ^ invention, 

an ORF identified firam the contigs of the present mventon) s^^^ wereidentified at 42 different nucfcbtide pdsffions. 

is typically examined usmg a coi^uler algorithm wtach ^ ^ j^^^ the ORF and 

startsatlhe5'oratthe3Vend^of^enucleot«le se^^ in introns. may affwt gene transcription. " 

lypioil algorithms wdl then CondSdons forincatating a mickic add mofccule with.a 

length « ,es^amplevary.lndibati^eonditiom;dependonthefomiat 

a rmP suitable for hybndizaUon. '^.^X^^^n employ^! in the assay, the detedion methods emploM and 

ondapr strudure tlul '^^r y^^^'^^*'^^^ S ZZ^ and nature of the nucleic add motecu/usil in the 

certain situalu)ns it may be aPproP"?^ ^ tae purs of ^ in the irt wiU recognize that any one of 

oUgonudeoUdes on a microarray " -^61*^° J«- ^ ^Lmmonly available hybridizatiSnTamplific^dbn or 

-pate" wiU be uientical, exoep />« ^ Zny assay formats can readily be adapted to emptoy the 

preferably is located m the center ™ fragWnts of the Huma^ genome disdosed herein, 

second ohgonucleoude m the ^,.^^^„^y°Zl Exampl« of such assays can bJ found in Chard, T. Ai, 

serves^acontroL Ihe ^^'"^J^^J^^JJ^l MrolctwntoRadiom^moasscvandRelmedTec^^ 

range ficom two to one milbon. The f^f.^^?™*- Ebevier Sdencc Publishers. Amsterdam. The Netheriands 

sized at designated areas on asubstrale using a Lght-duected rro«J;v RnllocV G R et t\ Techniaaes in 

d.mi«lp^^m ^;r^e " {iSyS^^.L^mic'ki'oLt ^mi 

other type of membrane. fiUer.div. glass slide or any otner VbL 2 (190). V>L 3 (1985); Tijssen. P, PWictfee 

suitable solid support . .w-,-.^ imd vUory Erizynu: ImBUUioassays: Laboratory 

In another aspect, an ohgonucleobde may be sjrathesiz^ ^ «<ift«wtoiy and Molecular Biology, m^vvu 

on the surface of the substrate by usmg a dienncal coupling ^ Publishers. Amstwdam. He Netherlands (1985). 

proorfure «id '^^^^^P^^^^^^^^^f^. '° me test samples of the present invendon indude cells, 

m per appUcation , protein or membrane extractt of ells. He test sample used 

whidi is mcoiporated herem m its '5 ,he above^eicribed method wiU vary based on the assay 

another a^ a "gndded" "'^r ^f^^^.^^^^i^'J, format, nature of the detedion method ind Ihe ti^ cells - 

btot may be used to arra^e awl link cDNA frapne"** " ^° used as the sample to be assayed. Methods for 

oUgpnuckotid^tottesurfeceofasttte^^^ 35 J^J^^^cleic add extracts or ofcellsieweU known in 

system. Ihemial. UV. mecham*^ S^T^I^Sb m« b^ Sie m ^d can be rea^lfly be adapted in order to obtain a 

r'l- "^u ^,.?^]^«-^Mtotor sample tha is compatible with the system utilized. - r 
otoduccd bv hand or by usmg avauaDic devices ^sioi Dioi or ^'^r r ^ ' - 

d^^w^atus). materiab (any suitable solid support). '^^*?.r*^'?'^K°^ ^5^r«ZLs to^J^^ 
a^ madSes (induding robotic instruments), and may provided win* contam the ne«ssary reagents to cariy out 
contains 24 96 384 1536. 6144or more olj«)nudeotides, the assays of the present invention. , 
riToihw S^^'Sen two and one^on which Spedfically. the inventim, provides a compartmentalized 
lends itself to Ihe efficient use of commetciaUy available kit to receive in c1<kc wnfinement. one or more eontauwas 
i^^entotton which comprises: (a) a first container comp^g one of the 

In Older to eondud sample analysis iBing a microarray or nucleic add motaailes that can ^^.}?''J^^^l^ 
deta^on toe RNA or DNA from a biotogical sample is « Human genome disctosed herein; and (b) om or more other 
iSto^ybSVn J °£^m,^WiA^isoUtei and containers comprising one "T^^.-f J^^^fZ^ 
cDNAfa pr«i*iced and uLd as a template to make antisense reagents, reagents capable of detecting presence of a bound 
RNA raRNAi The aRNA is amplified in the presence of nucleic acid. ...... .... 

to«^nid«tiJ«;^md labeled probes ^ incubated In detail, a c«mpartmentaU2«d kit mdud«_ai.y b m 
with the mictoarray or detecdon kS so that Ihe piobe 50 which reagpnts are coptamed m separate containers. Sudi 
SenSsSto to ^plementary oligomicteotidU of containers inckdesmaU glass coptamers. plastic contam^ 
S^ SaTy or detedion kit. Inalbatiol conditions ae; ^V^^iV^^'^'f^'^'V^'''"'^^^ 
adjusted so that hybridization occurs with predsc compk- sihca. Such containers aUows one to effiaenUy transfer 
mentary matdies or with various degrees of less comple- reagenb from one compartment to apother compa^nt 
mentorifyAficrremovalofnonhybri<todp,obes.ascanner 55 such that the samples and «?g"^ 
is used to determine the levels and patterns bf fluorescence. conUminated. and the agents or soluUoiBof eadi container 
HTSaS^ed iSses are examined to determine degree of can be added in a quantitatwe fashion &om 00? compart- 
complementarity ind the relative abundance of cadi oligo- ment to another. Sudi contamets wiU indude acontamer 
StidesequenceonthemictoarrayprdetedionkiLThe w^idi^wU accept Ihe^t sample, a. contam^ 
biological s*iples may be obtained 60m any bodfly fluids ^ the nucleic aad probe, conlmers whidi craton wdi 
(«Kbloo4 urine,^saliva. phlegm, gastrfc jufces. etc). « reagents (sudi as pho^hate buffered sahne. 1hs4,»feB, 
Lllured celS^biops es, or other &sie preparations. A etc.X and containers winch contain the reagents used to 
SSStySmTy be used to measu^ &e d«ence. detect the bound probe. One 
p«sence,and amount of hybridization for all of the distinct recogmze that tte previously uni&ntifiedkuwsegew 
S«enc<Jssim«ltaneously.Hisdatamaybe«sedforlarg6- present invention can be rouhnely "^endfied the 
scale correlation studio on tiie sequences, expression 65 sequence infotmahon disctosed herein can r^ffly m«r- 
J^nS mutationi variants, or polymorpWsn^T among porated into one. of the estabhshed kit formats whidi are weU 
Mmples. loMwn in die art, particularly ejqiression arrays. - ' ■ 
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Vcctors/host Cells viniscs, adenoviruses, poxvizvscs, pscudombie^ 

The invention also provides vectors containine the nucleic ref^oviniscs. Vectors may also be derived from combinations 

add molecules described herein, nic term "vector- refers to ^^^^ ^"^"^ such as those derived from plaanid and 

a vehicle, preferably a nucleic acid moleculS^!^ can t^^^^fL^^t^^^^^^ ^'S' '^^""'^ 

trar^port tl. .udeic^cid molecule. Wl^ th^^^^^ is a 5 fSS^ ^'eXJ^^B 

nucleic acjd^mo e^^^ li,L,MolecuhrC^g: A lM^ 

covalenayhnkedta^^^ C6ld Spring Harbor UbSratory Press^W 

of the mventipn, the vector indudcs a plasmid, single or N.Y., (198^. . - ^^ ^ fiapor, 

double stranded phage, a smgje or .double stranded RNA or The regulatory sequence may provide constitutive exprcs- 

VAn'i^n'^^d^l^^ diromosomc, such as a BAG. lo sion in one or more host cells (Le. tissue spcd&c) or may 

PAC, YAC, OR. MAC. ... provide for inducible expression in one or more cell types 

A vector can be maintained in the host cell as an extra- such as by temperature, nutrient additive, or exogenous 

chroniosomal element where it replicates and produces fa<^raidi as a hormone or other Ugjmd. A variety 

additional copies' of the nucleic acid molecules. providing for constitutive and . inducA)le e^qpression m 

Alternatively, the vector, may integrate into the host cell 15 P^'l^o'ic ^.cukaryolic hosts are weU fcnow^ 

genome and produce additional copies of the nucleic add ordinary skill in the art / '[[ ; 

molecules when the host GcU rqplicates. " - The nucleic acid molecules can twinseked into the vc^ 

The invention provides vectors for tie maintenance nJS'^^^^hrJ^Tffi.^^ 
(doning vectors) or vectors for exnression feroiession D«Asequence that wUtUluMtely be expressed jsjomed to 

Jjn^bninprokaryoacoreukaryoacce^ " 

'' . ' "" tion enzyme digestion and ligation are well known to' those 

Expression vectors oontam CIS-acting regulatory r^ioos of ordinary skill in the art 

that are opcr^ly linked in the vertor; to the nucleic add Jbe vector oontaiiiing the appropriate nucleic add mbl- 
mo ecrfes such that transcnpUon of the nuclefc add mol- 25 ecule can be introduced into Z ^priate h^^lTfor 

^"^J^ " ""f ^!"- "V^'i^ propagation or expression using ^Sl-toown tedSS! 

can be introduced into the tost ceU with a separate nucleic Bacterial cells inchide. but 1^ not limited to. ]Pcot 

aad molecule capable of aflfectmg tiansaiption. Thus, the Streptomyces. and Salnumella typhimurium. Eukaiyotic 

sewnd nudeic acid molecule may provide a trans-actmg cdls include, but are not limited to/yeast, insect cells sudi 

1^ l^^"^^ cis^rcgulatory control region to as Drosophfla. animal cells such as COS and CHO cells, and 

allow transcription of the nucleic acid molecules fo>m the plant celb ~^r^ 

bv^i•^!«S'!l^f^^^' li'^^'^^ ii-^^^ as descn^ed herein, it may be deslrabk to express the 

The regulatoiy sequence to which the nudeic add mol- nant protein, and aid in the purification of the protein by 

ecules descnbcd hercm can be operably linked inchide acting for example as a ligand for affinity purification. "A 

promoteis fprdirectmg mRNA transcription. These include, proteolytic cleavage site may be introduced at the junction 

but are not Lrmted to. Oie left promoter from bacteriophage of the fusion moiety so that the desired peptide can ulti- 

X, the lac, TRP, and TAG promoters from K coU, the early 40 lately be separated from the fusion moie^r. Proteolytic 

and late promoters from SV40, the CMV immediate early enzymes inchide. but arb not limited to. fector Xa thrombin 

promoter, the adenovirus early and late promoters, and and enterokinase. Typical fusion cmression vedtors inchde 

retrovirus long-teraund repeats. pGEX (Smith et aL. Gen^ 67:31-^ (1988)), pMAL (New 

In addiUon to control regions that promote transcription, En^and Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, 

ejqpresaon vectors may also inchide regions that modulate 45 Piscataway, NJ.) ^ch fuse glutathione S-trahsferase 

transaction, such as repressor binding sites and enhancers. (GST), maltose E binding protein, or protein A, respo^yely, 

Examples indude the Sy40 enhancer, the cytomegalovirus to the target recombinant protein. Examples of suitable 

immediate early enhancer, polyoma enhancer, adenovirus inducible non-fusion E. coU expression vectors inchide pTrc 

enhanccis, and retrovirus LTR enhancers. ; (Amann et j^L, Gene 69:301-315 (1988)) and pET 11 d 

In addition to containing sites for transcrq)tion initiation 50 (Studier et al.. Gene Expression Technology: Methods in 

and control, expression vectors can also contain sequences Engwwilo©? 185;6()-89 (1990)). V W 

necessary for transtrqition tennination and, in the ttan- Recombinant protein , expressions^ be maxiniiz^ in . 

scribed region a ribosome binding site for translation; Other host bacteria by providing a genetic bacltground wherein the 

regulatory control elements for expression include initiation host cell has an impaired capacity to proteolyticaliy cleave 

and termination codons as well as polyadenylation signals. the recombinant protein. (Gottcsman, S.; Gene Exj^e^ion 

The person ofordinary skill in the art would be aware of the ? Technology: Methods in Enzymology 185, Acdj5ciXik Pn^ 

numerous regulatory sequences that arc useful in expression San Diego, Calif. (1990) 119-128). Alternatively, the 

vectors. Such regulatory sequences are described, for sequence of the nucleic acid molecule of interest' can be 

example, in Sambrook et aL, Molecular Clomng:A Labo- altered to provide preferential codon usage for a' specific 

ratory Manual 2nd. cdL, Cold Spring Harbor Laboratory host cell, for example £. co/t. (Wada et aU Nucleic Adds 

Press. Q)ld Spring Harbor. N.Y, (1989),, : 20:2111-21181(1992)). ^ ^ -J^Td 

A variety of expresaon vectors can be used lo express a The nucleic acid molecules can also be exph:ssed by 

nudeic acid molecule. Such vectors include chromosomal, e3q)rcssion vectors that are operative in yeast Examples of 

qnsomal, and vims-derived vectors, for example vectors vectors for expression in yeast e.g., S. cerevisiae include 

d«ived from bacterial plasmids, from bacteriophage, from pYepSecl (Baldari, et al., EMBO /. 6:229-234 (1987)), 
yeast episomes, from yeast chromosomal elements, inchid- 65 pMFa (Kurjan et al^ Cell 30:933-943(1982)) pJRY88 

ing yeast artificial chromosomes, from vinises such as (Schultz et aL, Gene 54:113-123 (198^) and pYES2 

baculoviruses, papovaviruscs such as SV40. Vaccinia (Invitrogcn Cbrporalion. San Diego, Calit) ' - ' 
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The nudeic add molecules can also be e^ressed in insect recombinant vc^or constructs. Tb& marker can be contain^ ' 

cells using, for example, baculovirus expression vectors. in the same vector that contains the nucleic acid molecules 

Baculovirus vectors avaflable for expression of proteins in d^cribed herein or may be on a separate vector. Markers 

cultured insect cells (e.g.,Sf 9 cells) inchide the pAc scries include tetracycline or ampiciUin-rcsistance genes for 

rSmith et al., AfoL CeU Biol 3:2156-2165 (1983)) and the 5 prokaryotic host cells and dihydrofolate reductase or neo- 

pVL series (Luddow ct al.,. Virology 170:31-39 (1989)). mydn resistance for cukaryoUc host cells. However, any 

-Li certain embodiments of the inven^^ i^r that provide, sel^^^^ 



molecules described herein are ciqpressed in mammalian efficctive. • -\ - . ' — : • - 

cells usiiffi mammalian expression vectors. Examples of While the mature proteins can be produced m bacteria, 

marni3i egression vectors indudc pCDMS (Seed. B. « yeast, mammaUan cells, and other ceUs unto the conlro^^ 

Nature 329:840(1987)) and pNfT2PC (Kaufinan et aL, the approprUle reguhtory sequences, cefl-free transcr^^^ 

PKfnn J 6187-1 95 nL987^^ translation systems can also be used to produce these 

tMmJJ,^M ^^^Y^'P; proteins using RNA derived from the DNA constructs 

TTie expression vectors listed herem are provided by way ^^Z!l^AurJ:i^ 

of example only of the well-known vectors available , to desOTbed herein. - ■ ^ 

those of ordinanr skiU in the art that wouU be usefiil to „ Where secretion of the pepUde is desired, whidi is diffi- 

express the nudeic add molecules. The person of ordinary , cult to adiieve with multi-transmen^ranc domain contam- 

sSl in the art would be aware of other vectors suitable for ing proteins isuch as kmases. ^prppnate secretion signals 

maintenance propagation or expression of the nucleic add are incorporated into the vector: The signal sequcijce can be 

molecules described herein. TTicse are found for example in endogenous to the peptides or hcterotogous to these pep- 

Sambrook. J., Fritsh, E. R, and Maniatis, T. Molecular tides. * • : ^ ^ 

Cloning: A Laboratory Mamal 2nd, ed.. Cold Spring , Where the peptide is rrat seoetcd into the medium, yiudi 

Harbor Laboratory, Cold Spring Harbor Laboratory Press, is typically the case with kinases, the protein can be isolated 

Cold Spring Harbor, N.Y, 1989. from the host cell by standard disruption procedures, indud- 

Tbe invention also encompasses vectors in which the ing freeze thaw, sonication, medianical dMn^tion. use of 

nucleic acid sequences described herein are doned into the lysing agents and the like. The peptide can then terc^^ 

vector in reverse orientation, but operably linked to a ^ and purified by well-known purification method mdudipg 

ressulatory sequence that pennnits transcrq>tion of antisense ammonium sdfate precqiiution. acid extraction, anion or 

RNA. Tliuran antisense transcript can be produced to all, cationic exchange chromatography, phosphoccUulose 

or to a portion, of the nucleic add molecule sequences chromatography, hydrophpbic-mtcraction chromatography, 

describedherein. including both coding and non-coding affinity chromatography. hydroxykpatUe clffomat^hy, 

regions. Expression ofthis antisense RNAis subject to each 30 lectin chromatography, or hig|i performance l^md chroma- 

of the parameters described above in relation to expression tography. . , : \' 

of the sense RNA (regulatory sequences, constitutive or it is also, imderstopd that depending 

inducible expression, tissue-specific expression). rccombiriant ptbduction of the peptides dcscrfted herein 

TTie invention also relates to recombinant host cells peptides can have various glycosyUtio^patte^ 

containingthevectorsdescribedherein.Hostcellstiierefore 35 upon the cell, or rnaybe non-glyc^latcd as w^ 

include prokaryotic cells, lower eukaiyotic cells sudi as in baderia. In addUion. Uie pepUdes may mclude an iniml 

yeast, other eSaiyotic oTus such as insit cells, and higher modified methiomne m some cases as a result of a host- 

eukaryotic cells such as mammalian cells. mediated process. 

The recombinant host cells are prepared by introducing Vectors and Host Cells 

the vector constructs described lircin into the cells by 40 ^ , 

techniques readfly available to the person of ordinary skill in The recombinant host cells expressing the pepUdcs 

the art. These include, but are not limited to, calcium described herein have a variety of uses. Fust, the cells we 

phosphate transfection, DEAE-dextran-mediated useful for producing a kinase protem or pepUde that can be 

transfection, cationic lipid-mediated transfection, furtha purified to produce desired amounts of kmase protem 

electroporation, transduction, infection, -lipofection, and 45 or fragments, llni^ host cells OTutaming cxpr^on vecto 

other te<iniques such as those found in Sambrook, ct aL are uscfiil for peptide production. 

(Molecular Ci(mmg: A Laboratory Manual, 2nd, ed. Cold Host cells are abo useful for conducting cell-based assays 

Spring Harbor Laboratory, Cold Spring Harbor Laboratory involving the kinase protein or kinase protein fragments. 

Press, Cold Spring Harbor, N.Y., 1989). such as those described above as well as other formats 

• Host cells can conUin more than one vector. Tlius. dif- 5. known in the art Thus, a recombinant host ccU expressing 

ferent nucleotide sequences can be introduced on different a native kinase protein is usefiil for assaying compounds that 

vectors of the sarne cell. Similariy, the .nucleic add mol- . stimiilate or. inhibit kinase protew fimction. y . 

teaks can be introduced either alone or with other nucleic Host cells are also useful for identifying kiiiase protein 

add molecules that are not related to the nudeic add mutants in which these fonctions are affected. If tiieonitants 

molecules such as those providing Uans-acting fectors for naturally occur and give rise to a pathology,' host cells 

e]q)ression vectors. When more tiian one vector is intro- " containing the mutations are usefrd to assay conipounds that 

duced into a .cell, the vectors can be introduced have a desired effect on the mutant kinisse protein (far 

independendy, co-introduced or joined to the nudeic add example, stimulating or inhibiting fimrtion) i«iiich may not 

molecule vector. be indicated by their effect on the native kinase proteia 

In the case of bacteriophage and viral vectors, these can Genetically engineered host cells can be further iiscd to 

be introduced into cells as padcaged or encapsulated virus ^ produce non-human transgenic animals, A transgenic animal 

by standard procedures for infection and transduction. Viral is preferably a mammal, for example a rodent, such as a rat 

vectors can be replication-competent or replication- or mouse, in whidi one or more of the oeDs of the animal 

defective. In the case in which viral replication is defective, ^ inchide a transgene. A transgenc is exogenous DNA wiuch 

replication will occur in host ceUs providing functions that is integrated into the genome of a cell froin which a 

complement the defects. 6S transgenic animal develops and whidi remains in tiic 

\fectors generally include selectable madceis that enable genome of tbe mature animal inone or more cell typ<f or 

the selectiwi of the subpopulation of cells that contain tiie tissuesof the transgenic ammal.Tliese anmials are useful fior 
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studyipg the function of a kinase protein and identifying and binase and a selected protein is lequired. Such animals can' 
evaluating modulators of kinase protein activity. Other be provided through the constructioD of ''double" transgenic 
examples of transgenic animals include non-human animals, e.g., by mating two transgenic arn'mals, one con- 
primates, sheep, dogs, cows, goats, chickens, and amphib* taining a transgene encoding a selected protein and the other 

. ians. 5 containing a transgene encoding a rccQmbinase. 

A transgenic animal, can be produced by introducing Clones of . the non-human transgenic amniak described 

nudeic add into the male pronuclei of a fertilized oocyte, herein can also be produced according to the aiieth6ds 

e.g., by microinjection, retroviral infection, and allowing the described in "WQmut, I. et at Nature 3i85:8iO-S13 (1997) 

oocyte to develop in a pseudopregnant female foster animal and PCT International Publication Nos. WO 97/07668 and 

Any of the kinase protein nudeotide sequences can be WO 97/07669. In brie^ a cell, e.g^ a somatic cell; from the 

introduced as a transgene into the genome of a non-human transgenic animal can be isolated and induced to exit the 

animal, such as a mouse. growth cycle and enter phase. The quiescent cell can then 

. Any of the regulatory or other sequenceisusefiil in expres- be fused, e.g., through Uie use of electrical pulses^ to an 

sion vectors can form part of the transgenic sequence. This emicleated oocyte from an animal of the same spedes firom 

includes inttoriic sequences and polyadpiiylation signals, if j5 which the qui|MCcnt cell is isolated. The reconstructed 

not already included. A tissue-spedfic regulatory sequence oocyte is then cultured such that it develops to morula or 

(s) can be operably linked to the traiisgene to direct expres- blastocyst and then transferred to pseudopregnant female' 

sion of the kiriase protein to particular cells. ■ . foster animal The ofiEspring bom of this female foster 

. Methods for generating transgenic animak via embryo animal will be a doiwo^^ 

man^)ulation and microinjection, particularly animals such ^ ™ somaUc cell, is isolated. 

as mice, have become conventional in the art and are Transgenic animals containing recombinant cells that 

descrft»ed, for example, in VS, Pal No& 4,736^6 and cxpttss the peptides described herein are useful to conduct 

4,870,009, both by Leder et al, U^. Pat No. 4,873,191 by the assays described herein in an in vivo context 

Wagner et al and in Hc^an, B., Manipulating the Mouse Accordingly, the vvidus physiological factors that are 

£m6/yo, (Cold Spring Harbor Laboratory ftess, Cold Spring present in vivo and that could eflfect substrate binding. 

Harbor, N.Y, 1986). Similar methods are used for pfoduc- ^ kinase protein activation, and signal transduction, may not 

• tinn nf nther rranjggenic animal,*; ^ A transgemc ffiurider km- be evident from in vitro cell-ficee or oeU-based assays. 

mal can be ulentified based upon the firesenoe of the Accordingly, it is useful to provide oon-human transgenic 

transgene in its genome and/or expression of transgenic animals to assay in vivo kinase protein function, indudirig 

mRNA in tissues , or cells of the animalj^: A transgenic substrate interaction, the effect of specific irmtant kinase 
founder animal can then be used to breed additional anirnals 30 proteins on kinase protein function and substrate interaction, 

carrying the tian^ene. Moreover, transgenic animals carry- . and the effect of chimeric kinase proteins. It is also poisable 

ing a tratisgene can further be bred to other transgenic to assess the effed of nuU mutationis, that is, mutations that 

am'friak carrying other transgenes. A trknsjgenic animal also substantially; or completely eliihinate one or more kinase . 

includes animals in which the entire aninial or tissues in the protein functions. ' 
ammal have been produced using the homotogouslyrecom- 35 All publications and patents mentioned in the above 

binant host cells described herein. specification are herein incorporated by reference. Various 

In another erhbodunent, transgenic non-burnan anicaals modifications and variations of the described method and 

can be produced which contain selected systerns that allow system of the invention will be apparent to those skilled in 

for regulated expression of the transgene. One example of the art without departing firom the scope and spirit of the . 
such a system is the cre/loxP recombinase system of bacte- 40 invention. Although the invention has been described in 

riophage PI. For a desarq}tion of the creyioxP recombinase connection with specific preferred embodimertts, it should 

system, see, e.g., Lakso et al. PNAS 89:6232-6236 (1992). be understood that the invention as claimed shodd not be 

Another example of a recombinase system is the FLP unduly limited to sudi specific embodiments. Indeed, van- 

recombinase sysiem of £ cerevisiae (O'Gormaii et al Sd- ous modifications of the above-described modes for canyitig 

ence 251:1351-1355 (1991). If a cre/bxP recombinase out the invention whidi are obvious to those skflled in the 

system is used to regulate egression of the transgene, field ofmolecularbiology or related fields are inteiided to be 

animals containing transgenes encoding both the Cre recom- within the scope of the following claims. ~ - ^ 



• SEQOtNCELlSTIHG 

: <160> HUKBER OF S£Q ID BOS I 4 . ; . ' 

<210> SEQ 10 NO I 

<211> LENGTH: 2320 . . * . 
\-<212> TTFXl DBA ' .•- .'f - .^i: . . . ■ 

<213> ORGANISMS Human . 7;. .• 

<400> 6BQOEHCE1 1 . . i; ■ . r 

ccca^ggcgc cgtaggcggt gcatcccgtt cgegcctggg gctgtggtet tcccgcgcct ' 60 

gaggcggcgg cggcaggagc tgaggggagt tgtagggaac tgaggggagc tgctgtgtcc 120 

cccgcctcct cctccccatt tccgcgctcc cgggaccatg tccgcgctgg cgggtgaaga 180 

tgtctggagg tgtccaggct gtggggacca cattgctcca agccagatat ggtacaggac ' 240 
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t^rtcaacgaa occtggcacg gctcttgctt ccggtgaaag tgatgcgcag cctggaccac 300 

cccaatgtgc tcaagttcat tggtgtgctg 'tacaaggata agaegctgaa cctgctgaca * 360 

■gagtacattg aggg'gggcac .actgaagg-ac tttctgcgca gtatggatcc ^tcccctgg 42.0 

cagcagaagg tcaggtttgc caaaggaatc gcctccggaa.tggacaagac tgtggtggtg '480 

. gcagactttg ggctgtcacg gctcatagtg gaagagagga aaagggcccc catggagaag* ~ 540 ' 

gccaccacca egaaacgcac cttgcgcaag aacgaccgca agaagcgcta cacggtggtg 600 

ggaaacccct actggatggc ccctgagatg ctgaacggaa agagctatga tgagacggtg 660 

gatatcttct cctttgggat cgttctctgt gogatcattg ggcaggtgta tgcagatcct 720 

gactgccttc cccgaacact ggaetttggc ctcaacgtga agcttttctg ggagaagttt 780 

gttcccacag attgtccccc ggccttettc ccgctggccg ccatctgctg cagactggag 840 

cctgagagca gaccagcatt ctcgaaattg gaggactcct ttgaggccet ctccctgtac 900 

ctgggggagc tgggcatccc gctgcetgca gagctggagg agttggacca cactgtgagc 960 

atgcagtacg gcctgacccg ggactcacct ccctagccct ggcccagccc cctgcagggg 1020 

ggtgttctac agccagcatt gcccctctgt gccccattcc ^ctgtgagc agggccgtcc 1080 

gggcttcctg tggattggcg gaatgtttag aagcagaaca aaccattcct attacctccc 1140 

caggaggcaa gtgggcgcag caccagggaa atgtatctcc acaggttctg gggcctagtt 1200 

actgtctgta aatccaatac ttgcctgaaa gctgtgaaga agaaaaaaac ccctggcctt 1260 

. tgggccagga ggaatctgtt actcgaatcc acccaggaac 'tccctggcag tggattgtgg 1320 

. gaggctcttg cttacactaa tcagc'gtgac ctggacctgc tgggcaggat cccagggtga 1380 

acctgcctgt gaactctgaa gtcactagtc cagctgggtg caggaggact tcaagtgtgt 1440 

ggacgaaaga aagactgatg gctcaaaggg tgtgaaaaag tcagtgatgc tccccctttc 1500 

tactccagat cctgtccttc ctggagcaag gttgagggag taggttttga agagtccctt 1560 

aatatgtggt ggaacaggcc aggagttaga gaaagggctg gcttctgttt acctgctcac 1620 

tggctctagc cagcccaggg accacatcaa ^tgtgagagga agcctccacc tcatgttttc 1680 

aaacttaata ctggagactg gctgagaact -tacggacadc atcctttctg tctgaaacaa 1740 

acagtcacaa gcacaggaag aggctggggg actagaaaga ggccctgccc tctagaaagc 1800 

tcagatcttg gcttctgtta ctcatactcg ggtgggctcc ttagtcagat gcctaaaaca 1860 

ttttgcctaa agctcgatgg gttctggagg acagtgtggc ttgtcacagg cctagagtct 1920 

gagggagggg agtgggagtc tcagcaatct cttggtcttg gcttcatggc aaccactgct 1980 

cacccttcaa catgcebggt ttaggcagca gcttgggctg ggaagaggtg gtggcagagt . 2040 ~ 

etcaaagctg agatgctgag agagatagct ccctgagctg ggccatctga cttct'acctc 2100. 

ccatgtttge tctcccaact eattagctcc tgggcagcat cctcctgagc cacatgtgca 2160 

ggtactggaa aacctccatc ttggctccca gagctctagg aactcttcat cacaactaga 2220 

tttgcctctt ctaagtgtct atgagcttgc accatattta ataaattggg aatgggtttg 2280 
gggtattaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa , 

<210> SEQ ID HO 2 
<211> LEBCTH: 255 
<212> TTFSs PRT 
<213> ORGAHISM: Buman 

<400> SZQOEBCEs 2 

Met Val Gin Asp Cya Gin Arg Asn Leu Ala Arg Leu Leu Leu Pro Val 
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Lys Val Mot Arg Ser Leu Asp His Pro Asn Val Leu Lys Fhe He Gly 

Val Leiu.tyr Lys Asp Lya Lys Leu Asn Leu Leu Thr Glu Tyr lie Glu 
35 40 45. . . 

Gly Gly Thr Leu Lya Asp Phe Leu Arg Ser Ket Asp Pro Phe Pro Trp 
50 55 60 

Gin Gin Lys Val Arg Fhe Ala Lya Gly He Ala Ser Gly Met Asp Lys 
65 . 70 75 60 

Thr Val Val Val Ala Asp Fhe Gly Leu Ser Arg Leu He Val Glu Glu 

85 90 ' ' 95 

Arg Lys Arg Ala Pro Met Glu Lys Ala Thr Thr Lys' Lys Arg Thr Leu 

100 105 .T.:.'. 

Arg Lys Asn Aap Arg Lye Lye Arg Tyr Bir Val Val Gly Asn Pro Tyr 
115 120. 125 

Trp Met Ala Pro Glu Met Leu Asn Gly Lys Ser T^ Asp Glu Thr Val 
"0 135 . 140 

Asp He Phe Ser Fhe Gly He Val Leu Cys Glu He He Gly Gin Val 
145 150 155 160 

Tyr Ala Asp Pro Asp Cys Leu Pro Arg Thr Leu Asp Phe Gly Leu Asn 
165 170 175 

Val Lys Leu Phe Trp Glu Lys Val Pro Thr Asp Cys Pro Pro Ala 
160 185 190 ' 

Fhe. Phe Pro Leu Ala Ala lie Cys Cye Arg Leu gIu Pro Glu Ser Arg ' 
195. • . . . 200 • :-...:205 . " 

Pro Ala Phe Ser Lys Leu Glu Aap Ser Fhe Glu Ala Leu Ser Lea Tyr 
210 215 220 

Leu Gly Glu Leu Gly He Pro Leu Pro Ala Glti Leu Glu Glu Leu Asp 
225 230 235 240 

His Thr Val Ser Met Gin Tyr Gly Leu Thr Arg Aap Ser Pro Pro 
245 250 255 



<210> SEQ ID KO 3 
<211> LENGTH! 59065 
<212> TYPES OHA 

<213> ORGAHISHi Hunan - " ■ ^ : ■ - . 

<400> SEQUENCES 3 

tcatccttgc gcaggggcca tgctaacctt ctgtgtctca gtccaatttt aatgta^gtg " 60 

ctgctgaagc gagagtacca 9aggtttttt tgatggcagt gacttgaact tatttaaaag 120 

ataaggagga .gccagtgagg gagaggg^tg ctgtaaagat aa'ctaaaagt gcactt'cttc ''180 

taagaagtaa gatggaatgg gatccagaac aggggtgtca taccgagtag cccagccttt ' "240 

gttccgtgga cactggggag tctaacceag agctgagata gcttgcagtg tggatgagcc * ' 300 

agctgagtac agcagatagg gaaaagaagc caaaaatctg ~aagta'gggct 99ggtgaagg ' 360 - 

acagggaagg gctagagaga catttggaaa gtgaaaccag gtggatatga gaggagagag "420 

tagagggtct tgatttcggg tctttcatgc ttaacccaaa gcaggtacta aagtatgtgt ' "480 

tgattgaatg tctttgggtt tctcaagact ggagaaagca gggcaagctc tggagggtat 540 

ggcaataaca agttatcttg aatatcctca tg^tggaaag tcctgatcct gtttgaattt 600 

tggaaataga aatcattcag agccaagaga ttgaattgtt gagtaagtgg gtggtcaggt 660 

tacagactta attttgggtt aaaaagtaaa aacaagaaac aaggtgtggc tctaaaataa 720 
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tgagatgtgc tgggggtggg gcatggcagc tcataaactg accctgaaag ctcttacatg .780 
toagagttcc aaaaatattt ccaaaacttg gaagattcat ttggatgttt gtgttcatta 840 
aaatetctca ctaattcatt. gtcttgtcca ctgtccgtaa cccaadctgg satt^gtttg. 900 
agtgagtctc tcagactttc tgccttggag tttgtgagag agatggcata ctctgtgacc ,960 
actgtcaccc taaaaccaaa aaggcccctc -ttgacaagga gtc^gaggat tttagaccca 1020 
ggaagaatga gtgatgggca tatatatatc ctattactga ggcatgagaa gagtggaatg 1080 
ggtgggttga ggtggtgttt taaggcctct tgccagcttg tttaactctt ctctggggaa^ IKO 

cgagggggac aactgtgtac attggctgct ccagaatgat gttgagcaat cttgaagtgc 1200 ^ 

caggagctgt gctttgtcta ttcatggccc ctgtgcctgt gaaacagggt tcggtgactg 1260^ 

tcactgtgcc tgtggcagtc tgtagttacc cagagagaac aaagetgcat acacagagcg 1320 

cacaagggag tcttgtaaca accttgtcct gctttctagg gctgagtcag gtaccacagc 1380 

ttgatctcag ctgtcctctt tatttcaaga agttgacatc tgagccatac caggagtatt 1440 

gtattttgtt tgaggcctct ctttttggag gaacatggac egactctgtg cttttgtcta 1500 

tgctggtctc tgagctcaca caacccttca ccctcctttc tcagccagtg ataggtaagt 1560 

cttccctatc ttgcaaggct cagctcaagt gtcagcttcc tctacaaaga ctttcctggt 1620 

tcccctcatt ggagtgaaca agagttgaca tggtagaatg gaaagagcag aagctttaga 1680 

atgagccaga cctgagtatg aatgc^gat ccaccact^ gctagtcaac cctgccccct 1740 

gcctcaagtt ttaattttcc tatccattaa gtgaatataa taatacctgt gtcacaggat. 1800 . 

tattttgaga attaaatgag attaggtcta tgaaagcacc tagcagagtt cttggcatat I860 

aggaggcatt cattaaatat ttgttcttcc ccttttatac ccattacttt tctttttctg 1920 

aactaaaata atacttggtt ctatctctga aataacatcc aagtgaaaaa tcaacaacat 1980 , 

gaaagagcag ttcttttcca gtggatttgc ttcttaagga gcagagatta tgtaatctaa 2040 

cagcctccaa catacaaaga gctttgtatc tagaacaggg gtccccagcc cctggaccgc 2100 

caactggtac gggtctgtag cctgttagga accaggctgc . acagcaggag gtgagcggcg 3160 

ggceagtgag cattgctgcc tgagctctgc ctcctgtcag atcagtggtg gcattagatt 2220 

ctcataggag tgtgaaccct attgtgaact gcacatgcaa gggatctggg ttgcatgctc 2280 

cttatgagaa tctcactaat ggctgatgat ctgagttgga acagtttgat accaaaacca 2340 

tccccecgcc ccccaacccc cagcctaggg tccgtggaaa aattggcccc tgsrtgccaaa 2400 

aaggttgagg actgctgatc tagaggacca atttattcaa tgttggttga gtaaatgagc 2460 

tcttggatta ggtgatggaa aaatctgaaa aaacagggct tttgaggaat aggaaaaggc 2520 , 

agtaacatgt ttaacccaga gagaagtttc tggctgttgg ctgggaatag tcataggaag 2580 

ggctgacact gaaaagaagg agattgtgtt cgtttcttct tctcagagct ataagcaaag .2640 

getgaaagtt ctagaaaaag geaagttttg tttcagtaga oaaaaggata atcagaacca . 2700 

tttttagaaa atggaatgag actacttttg aggccatgag .ttccttgtcc ctggagagat 2760 

gagcagaggt tggacaagtg cttaccagag atcttgtgga ggcagaaact gtgcatctag . 2820 

cagagcattg gcctaaccct ttcaaatgag atgctgttaa etcagtctta ttctacatgg 2880 

taggaatcct gtccctttgc ctcctgctac tttgggcctc tcaacctctt ggttttgtgt 2940 

gcaggtgaag atgtctggag gtgtccaggc tgtggggacc acattgctcc aagccagata . 3000 

tggtacagga ctgtcaacga aacctggcac ggctcttgct tccggtaggt gggcctatcc . 3060 

tcccatcttt accagtgtac tatgggccaa gcactatttc atgttctgat ggaaaacaca 3120 
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goAacaagct tctgagttga gaatttcaat cttagggtgg' ggaaaggaat gbaccaagga 3180 
a^fSCtcatg accaoacctc eagtgtggcc cccctgaacc caggttaaat tggaagagcc 3240 
ataaatgggc cagctggagg cagggtgggg^ ggatgaga^g i gccctttcc agggttgtcc 3300 
catatccctc actttatggg tgaggaaact gaggcccagg aagagtgact ttcctgtggc 3360 
tgcactacag attatgcagg tacttcaaga gttgtttgta ttcttatttt attttatttt 3420 
attttatttt attttatttt attttatgag agggattctt gctgttgccc aggctggagt 3480 
gcagtggtgc aatctcggct cactgcaatc tctgcctgct gggttcaagt gatttttctg 3540 
ccttagcttc ctgagtagct gagatgacag gcacctgcca ccatgcgcag ctaatttttg . 3600 
tattttagtg gagacggggg tttcaacatg ttggtcaggc' tggtcttgaa ctcctgacct '\ 3660 
caaatgatgc acccacctcg acctcccaaa gtgctggaat: tacaggcgtg aaccactgtg 3720 
cccagccaag agttgttttt agtgtggttg gcagagccag ctcttcettc accacaggat '3780 
gcctccctag gttcctactt tttgttacta gcttttatta tagctatatt attattatta 3840 
ttattattat tattattatt attattgaga cagagtctcg ctctgtcgcc caggctggtg 3900 
tacagtggtg cgatcccggg ctcactgcaa cctctgcctc ccgagttcaa gcagttctcc " 3960 
tgcctcagcc ccccgagtag gtgggactac aggcgcctgc caccacaccc ggctaatttt 4020 
tgtattttta gtagagacgg .ggtttcacct tgttgaccag gctggtctgg agctcctgac 4080 
ctcaggtaag tgctagaatc acaggcgtga accactgcgc ccagccaaga gttgttttta 4140 
gtgtggttgg cagagccagc tcttcct'cac cacaggttgc ctc.cctaggt tcctactttt 4200 
tgttactagc tttattatag ctacattatt a ttatt a ttg "ttattattat tgagacagag -4260 
tetcgctctg tcgcccaggc tggtgtacag ^gatgtgatc "ttggctcact gcaacctctg 4320 
ccccccgagt tcaagcaatt ctcctgcttc agccccccta gtaggtggga ctccaggcac 4380 
ctgccaccac gcccagctaa tttttgtatt tttagtagag gcggggtttc accttgttgg 4440 
ccaggctggt ctcaaactcc tgacctcagg 1:gatccgcct gcctcggcct cccaaaatgt 4500 
tgggattaca ggcatgagcc accgcgccct gcctatagct acattatttt tgtaggcagc 4560 
tcagtttctt aaaaattata cagacttcaa atcagatttg ttcctgctgt ctgaggctca ; 4620 
gtttcttcat ctggaaaatg gatggtaata atcttgttga gattgaatga aataatatat 4680 
gcagtgtatc cagtacatgg tagacaccca gtgaatggtt attccttect cccatcggat 4740 
tggaattctc aagggtggga acttgtcttt atattcttca caacgtaaaa tagttgaaat 4800 
ttgttggtgg aaagaagagc agtccactcc agaggctgga tgggcatgcc tggcccccaa ■ 4860 
ggtctgaagt ggtagggctg tgcctatatc ctgagaatga gatagactag gcaggcacct . 4920 • 
tgtgctgtag attccagctc ctgcacatag ctxttgttgt'Waaacatccc igtgdttata 4980 
ccaagtaatt gagttgacct ttaaacactt gcctcttccc tgggaaccat ataggggatt 5040 
ggcctggaga cgtctggcct ctggaagagt tggaaagcag ccatcattat tatcctttcc 5100 
tttcagctat aactcagagc tctcaagtct tttctgtgga tcttattgcc ttggttcttg 5160 
ccccttttac tcccagggaa gttgattctg tcttttctgt tccatttagt atgacaggag 5220 : 
cagagaatgt cagagctgta agggacctta tagttaaagc ctttggctgg tcctttcatt -5280 
-ttatagctgg gactaataag taacgtcaaa acccaatgag ttcacagatt gggtctcgcc 5340 
ttggcatgta acccatatgt tcatattctt gctgttttcc tatgtgtatg aatattttct - 5400 
atccaaaata agcaggacag ggtagagcaa gttaatcttt gg««tttctg gattctctta 5460 
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gagctaaaaa acttcagaac tagaagaaac cacccactat atggtataac ccattcatat 5520 
cacagatgag gcctgaaacc. aaaaagactt gctcaggcca .tggatgacaa gagctggccc 5580 
tagcactgaa ctcttgggtc atttgtaggt ctagtcagat gctagettgt tagctctgtg 5640 
egtgcgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgagat agagacagaa agataacata 5700 
tgtacacaaa tacataaaga ggaagtagac acgttageat ggtagataag agtacaggca 5760 
ggccaggcgt ggtggctcac gcctgtaatc ccagcacttt .gggaggccaa ggcaggtgga , .5820 
tcacctgagg tcaggaattc gagaccagcc tgaccaacat ggtgaaaccc catctctact 5680 
aaatacagaa aaaaattagc ttggcatggt ggcacatgcc tgtaatccca gctacttggg . 5940 
aagctgaagc aggagaatcg cttgaatccg ggaagcagaa gttgcagtga gccgagattg ^ 6000 
tgccattaca gtctagcctg ggcaacaaga gggaaactcc atcgcaaaaa aacaaccacc , 6060 
accaagagta caggctatgg aatgagacta tggttttaaa tcc^gcttt gcaatttatt ,6120 
aactagcctt aagtgacttc cctgagcttc aggcaccaat ctg^aaatg aggataagaa 6180 

tattactcat gccacatggt tgttagggag gattaaatgt gataacctat ataaagtggc 6240 

tagcatagca tctgacatat agaaaactct taatagggcc ggacgtggtg gcttatgcct .6300 
gtaatcctag cactctggga ggccgaggca gaaggatcgc ttgagcccat gagcccagga . 6360 
gtt-tgagacc agcctggcca acatggcaaa actccacctc "tacaaaaaat acaaaaatat .^6420 
tagccaggcg tgatggcaca cacctgtagt cccagctact tgggaagctg aggagcgatg ^ 6480 
attacctgag cccagggata tcaaggctgt agtgagctgt gatcatgcca ctgtactcca , . 6540 
. tccagctggg ggacagagtg aaacccctgt ctcaaaacaa aacaaatgaa aaaaaaaacc. _ 6600. . 
cttaataatc agtaactgtc actttatatt atgttgtgag tgtgtgtcta tatacaccta 6660 
tatgtataca tttctcttat tacacattca ^tggtgatct gatgtggagc cccagggatt 6720 
aagggcaact ttgaactacc ctgacacaat caagccaaat atcattcccg tggaggaag^ ^780. 
agagtatcta ggttctgtct cctagttgca gctttacctt gaggacagag actctaatcc 6840, 
agctgtgctg aaggagcaca tctcctgact tctgagcttt cccctggtaa attcaaactg .6900 
gatgtcacgg cgccctcaga tagagcctgg taatttgccc tggggagagt gactgtcttt. , .6960. 
tggatctaat ttgacttttg ccccagttgg aggaaaatct teagggctag gaaggattgt , 7020. 
atttgtctga ccccagagat aacctgggtt ttgaggaaca tggggcatca acctgaatgg 7060 
tettgtaaga tctctcccac gccagcttgc cagtgtttct etgatgaatt tagagtaeeb 7140 
gagtagtgca ggcctgctgg gaggaggact ctccctctgt gctactcaga gaaattcatt . 7200 
c'ttcaaggcc cccttccagc cttgctctta cccagctggg .etacagttac aataaaggaa . 7260, ' 
atgacttttc ttctcccctt cccccagtac ctttgttttc etagtcacag ggt:ggggctg^ ..^ .7320 
gatattgaat ggagaaattg ctggggtcca tcctaaactc ctcccctcat ctctccctta ...7380 . 
cattacccca ttcttctgtc tgcagccaca tccaiaatcc tgcctctgtt agccttccga ^..7440 
cagaccctca ggtgcccagg acaacaggaa gctacttaaa gctggaacet cagactgtgc, : 7500 
aatggaggcc agtgacaaaa ctgaaagtag ctctgtcagt aa-ttgtgctg gtgcgattag ,..7560 
gcagctggcc agaatctttt ggatctcctg gaca^atggc tgactagtcc t^cccaagcct ,^7620 
tcccaacagg cctctttttt ttcctttttt tcttttcttt tttttctttc tttctttctt . ,7680 
tctttttttt "ttttttt^g gctagtgaag tgaaattgtg ggagtggaaa aggaacaaag . 7740 
aaatcggtaa ctggtagtga tcaattactt gtaaaeacta ttgtacttgg accagcccag ..7600 
taggcctttt ttaaaactct gagttacctc tctttccttt ccttgagcag tgccattaat 7860 



us 6340^83 Bl 



47 



48 



-continued 



tctgtatctg gggcaatcct ttctgatgtt ctctggacct ggctctctct ccttagg«ga' '7920 
ggccaggaga gtagccagag . agcatgtcat ttgtagctga ggttaaagt^ ' tggagctatc '7980 
aatggtgacc tggcctcttg gcatgttagc aagccagagg accttgacaa cttttttgat - 8040. 
gattgtccgt tcaccctgat caaaggtg^^ tggcttagga ggagggaaga aaagctaccc 6100 
ctattagtct tgatggcccc agcgtgggtc tctattgctt gacctggttc ctagcagcat '8160 
tatcagaagg aaaatccacc gctcttaagg ctcctgggaa cttteaggac ttcctttctc ' 8220 
aggattgcaa acataagact atttgagct-t tcacttttga aaagcggtta ctaatacct:a 8280 
tactctggga aagggctaat gcagatagaa gactgtggtc actgcatcag gcaacagacc . 8340 - 
atttccgcta aatttagtga ctccaggaag gccagtgaag aaataacaca cgtogcaacc ' ' 8400 
agagactgtg ttgtaatatg ttggctgaca gcagggtact' ttctgtgatg^ ctgaaagcca 8460 
cattcatttt ctctcccctc atccccatct aagcaagcct ggtagaatca taattacagt '8520 ^ 
aataggtacc acttattgag tactctgtgc cagabaccct cctgagcata cgacatgcat ' /8580 
agcacattta atccttacaa ^gacttaa^ aaatgtagta c^gtcttac c^c^tcgag 8640^ 
aatagggaaa tggaggttac ttgtttaaag tcacagagct ' aataggtagc atagctgaga 8700 
tttgaactca ggcattctta ctccttgcct gcaagagtct cttggcattc ttgaatgcaa - 8760 
gcatatttct taacctcact gaggctcagt ttcctcttat ataatatggg gtaaagagcc ■ 8820 
ctcaccctgc ctgccacaca etggtagtgt cagataacat tgaagggtgt tagtttaaag ' - 8880 
gcttcatgga ctctataat^ tcaacaaaag tgctgttaac tttcttctgg. gtctca^^ . 8940 
cctgatgtag agtcagtgga gcaaccctgc catctgctgt tatgctgttg' atgttgctgc 9000 
cacacttact aacctaaacc tttgattctg gctgtggcct tctcc'agaag ^tgtttactc '■ 9060 
atttgtccag tttatctttt aggaaacagc cagcccgtag atcattaagg ctggctattg 9120 
gacagggggc tggggcctgc ctgacagagg aaggaagggc agacatctgg ttcttcctct 9180 
gcccctacaa gagactccag cctgaccaca gagtggtact cctaggatgt agcagcagca '9240 
tatgagcttg aatgtgcctt aatcctgctc 'tttactttga gaagagagaa ctaaggaccc 9300 
acagatgttt cacagcttct ataggaggca gaggtagaaa aatggagaga gatgaggcca' 9360 
gagatagata actgatatta attaaacgtt gtattaagaa cctcacttag ' attatctgat 9420 
tcaatcttca taataaccct gcaaccccca cctttttttg agaacagggt cttgctctgt .9480 
tgtccaggct acagtgcact ggtacaatca tagttcactg cagtgtcaac ctcctgagct 9540 
caagcaatcc tcccacctca gccttgcaag cagcttggac' tacaggcgtg ccaccacacc ' 9600 
ttgccatttt.tttttatttt aagtagaaac aaggtctt^t -taatactatg ttgcccaggc 9660 
tggtcttgaa ctccagcgat cctcet'gccc cagcctccca aagtgcttgg gattacggaa .'.9720. 



9780 



gtaagccact gtgcctggcc agtgcaacec ceattttata ctaaaacagg aaggcceaga 
aaggtttgga gtaacttgtc cagggtcaca cagatgatat^ttgaactcag gtctecctgg '~9840 
ctcccaagag agtetgcttt ccactaggac tcccaggaga aaaaaaaaaa aaaaaaeagt 9900 
agacttggag acagaaaatc tgatttgagt cttagttgag etaggetaae tgtgtaaetg ! ' 9960 
tgggcaagtt ccttagcccc tgtgagcctc agtttcttat ctgtaaaatg tcataaaaga '10020. 
aatccatctc a^ggagtagt tgtgatgatc aaggactctg aaaacattag aatggtttaa 10080 
tgtgaaggat tagcagcagc acatggcaac attgtgcatc ^tatattaac tatccaaata 10140 
tatcaagcgt catttgctat atataaaagt cateaoatta ggcaetgtgg gggataegga' '10200' 
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9ttggcatac tagcctggcc tcttaattaa ttcattaatt agcttattta tttttgagat .10260 
aggtcttgct ctattgccca ggctggagtg cagtggcatg atgatagctt. actatagcct . 103 y, . . 
- eaatctccca ggcttaaaca atcctcctga gtagctggga ctacaggcac .awctaccat ; 10380 ' . 
gcccagctaa ttttttttta attttttgta gagacagggt cttgctctgt tgcccaggct. 10440 
ggtctcaaac tcctgggctc gagatcctcc cacctgggcc tcacaaagtg ttgggat^tac 10500 ^ 
aggtatgagc cacggcacct ggcctggtct cttaactggt tccctaagac agctggaaa^ 10560 
agagaatgtc atggagcatt cctaaccatg ggctccagcc tggctttcat tctgtttctc 10620 
ccctgaaaca acattccttt egtaatattc cgaataacag cttcatcagt ctgtctaccg , 10680 
accactcttc'aggcttcatc ttatatgacc tcccaaactg cactaagggt tgtattagag 10740 
aaaagtggat aeiagttcgga gtcaggctgc ^tgagcttaa atgccagctt cacttaccag , 10800 
ccacctgacc atgagtcagc tgcttaacca ttctttgcca cagtttcctt gtctatgaaa . 10860 
agggaaatgg ctcccacctc aaaaagttgt ^aacattaaa ttcaatcatg tattcaaagt. 10920 
cctgagcaga atgtctggcc atgactggga cttaacagat gttagcattt attattagta . 10980 ^ 
tctgtcagtc ttgaaatgtt ctcttccctt ggctttcatg acattccaca ctctcctggt 11040 , ^ ., 
tttctcttac ctctctggta atacctgttt gcttatcctt ctttgtccag ctctgggatg, ,11100 
ttaccattcc ttcaggcgtgctgtt ttctc cttaggcagt cttacacaca ctcatgactt 11160 
ccttccattg tcctccacac actgatgacc ctaaaatcag tatctccagc ctaaaccttt .11220 
ccactgagrtt ctagacccat atgttgtact atcaacctgg cttgtccatt tgaatgtctt .11280 . 
* ecaggcactt cagactctct tctctagact -ttgctggact ttcactcttc cccctaaaac , 11340 , . ^ ' 
tggctcctct tccactgaaa catgtatg-tc attgagaggc accaccatcc acccagtgcc 11400, 
taagccagaa acctaggaat ccttgatacc tgttctctct catcctgcat atccaagcct 11460 
atcagtttta tctctaaatt atattttggt aggtttactt ctttcctttt ctcccaccac 11520 
caccctgctc caagctacca tcatctcacc tggatgtctg caatagcctc atctcccaca 11580 
gccactctgc accccctaat ctgttctcta •tagagcagtt ggaaggagtg atttttgttg 11640 ^ 
tttgttttgt tttgttttag acagagtctc actctgttcc ccaaggctgg agtgcagtgg 11700 
cacaatttcg gctcactgca acttctgcct cccgggttta agcaattctc ctgcctcagc ,11760 

ctcccaagta gctgggatta aggcaccggc ccccataccc agctaatttt tatattttta ,11620 

gtagagatgg ggttttgcca tgttggccaa gctagtctcg aactcctgac ctcaagtgat .11880 
ccacctgcct cggcctccca aagtgct^gg attacaggtg tgagccactg cacctggctg 11940 
gaaggagtga t'cttaaaaaa aaaaaaaaca aaaaaaaact tgactgtgtc actctgtgtt - .12000 

gtctctccta .ccttgtatac. ttccacaact- tcccagtgtt cttggaV^a gaccaaaatc • 12060 , ' ^ 

cttaacttgg ccaggcgcgg tggctcacac ctatcatctc agcactttgg gaggccgagg 12120 , 

caggcagatc atgaagtcaa gagattgaga ccatcctggc caacatggtg aaaccccatc ,12180 

tctactaaaa atacaaaaat tagctggtcg tggtggcgtg tgcctgtagt cccagctact 12240 . 

tgggaggctg aggcaggaga atcacttgaa cctgggaggc agaggtitgca gtgagcccag . 12300 

atcacgccac tgcactccag cctggtgaca gagtaagact ccatctcaaa aaaaaaaaaa 12360 

aaaaaaaaaa ttccttaatt tggcctacag tagagccctc cgtaatgtgg cctctctcca 12420 . 

catctccaca acctcctgct ccctgcatrtt cagcctcacc tctcttctgg acaggccctc 12480 

cttctgacaa gggctttgtt cattctgctc cctctgccta gaatgccccc ttactctgtt .. 12540 . 

cacttaactc ctgct^tcg tttagatctt ^acctggatg gctcagagaa atatagaagt 12600 
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Aattcctcac cctgaaaaat aggttaggtc cctgttttat gttttcatag acctttcctt 12660 
- tgaggctttt tttaaaaaag tagttttaat ctcacottta' ttcatgtg^^ ^[ 
atgatatctt aagacctcta atagaacaat ttggtcatgg actgtggggt ttttgcccct 12780 
cattgtgtca gcaetgagca tattgttggc ataggaggga tatttgttga atgaattgct 12840 
agaggtggcc aagagatatg atgtaagtca ggcttttccc tgcccttccc cttccccttc 12900 
cccacatcct tcctatagca gccaccgtgg ctgcagttac tgtaaatggc aagacggaat 12960 
cagttccgga cattgggttg ttttagaaaa ttgcctgcaa gtgteagggt gataagttaa 1302O 
agctttgtct tttgccctca gaggagctat cccatagtga'gtagaagcca gagaagctga 13080 
ccccaggagt ccttctttcc agcagcaggt cttgagctgc acttctctgt agctacaatc 13140 
caggcaggaa caagccctag gtacctccgg agaggagggc aagagaggaa gaatgagttc 13200 
agctacteta gccaccaaac tgattatgaa ttgccctgaa atctgaaaaa tttcaattcc 13260 
aatcgtaagt ttgttttgtt tcattttgtt ttcttaaatt gtatatttga aagatggcat 13320 
taactaaaga batata ttca atatagagtg gaaaaaatgg aatacttgca tagtatcttt' 13380 . 
^cttatagg tgatttatga tggggagtgg ggtggatagg ttggcagttc ccccaagaag 13440 
ttggaaatga agtttgtcct ctgtgagttg aactaattag atccacaagt aatgaaagca 13500 
gtattgtgtt gtagttaaga gcacactcta gaaccagatt gcttagtttc aaatcctggt 13560 - 
tctgcctttt attatctgtg tactttgggc aagttacttg ccctttgtgt gcttcatttt 13620 
tctcatctag aaaatggaga ggccaggcgt agtggctcat gcctataatc ccagcacttt 13680 - 
gggaggccga ggcgggcaga tcacctgagg tgagaagttc aagaccagcc tggccaacat 13740 
ggtgaaaccc tg^ctcrtaca aaaatacaaa aattagccag gcatgatggc gggtgcctgt 13 BOO 
aatcccagct acccaggagc ctgaggcggg agaaacactt gaacctggaa ggcagaggtt 13860 
gtagtgagcc aggattgcac cactgcactc cagcctgggt ^gacaagagct agactcagtc * 13920 * 
^aaaaaaaa aaaaaaaaac aaactggaga tacaggctgg gtgcagggct tacacttata 13980 
atatcagcae tttgggaggc ctaggcggga ggattgcttg aactcaggag tttcaagatc 14040 
agtctgggta acagagcaag acctcatccc cacaaaaaat caaaaatita gccaggcatg 14100 
gtggctcatg cctgtggtcc cagctactca ggaggctgag gcgagaggat tgcttgagcc 14160 
caggaggttg aggctgcagt gaaccatgac tgc'accacta catgccagcc tggatgacag 14220 
agcaagaccc tatctcaaaa aaaaaaaaaa aaagaaacga gccaggcgcg tttgctcacg 14280 , 
ccagtaatcc cagcactttg ggaggccaag gcaggtggat cacttgaggt caggagatcg -14340 - 
agacta^cct. iggccaacatg gitgaaadccc atctcaactg aaaatacaaa aattiagccag -14400 • - 
gcatggtggc atgctcctgt agtcecag<rt actcacttgg aggrtgaggc acgagaatcg ,14460 ' 
cttgaaccca ggaggcggag g^tgcagtgg gccaacatca tgtcactgca ctccagcctg 14520 
ggagacagag cgagactctg tctcaataaa taaataaaea taaaataaaa taaaataaaa '14580 
taaaataaaa taaaaaaata tggaggccag eaggcacggt ggctcacgca tgtaatccca 14640 
gcactttggg aggccgaggg gggcggatca caaggtcagg agatcgagac catcctggct 14700 
aacacagtga aaccgcgtct ctactaaaaa tacacaaaat tagccaggca tggtggcagg 14760 
cacctgtagt ccctgctact caggaggctg aggcaggaga atggcgtgaa cccgggaggc '14820 
ggagcttgca gtgagctgag atcgcgccac tgc'agtccag cctgggcgac agagcaagac 14680 ' ^ ' 
tctgtctcaa aaaaaaaaaa aaaaatggag gttgggcgcg gtggctcgcg "cctgtaatcc "14940 
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cagcactttg ggag^cgag gcgggcggat cacctgaggt caggagttcc agaccagcct 15000 
ggccaacatg gtgaaacctt gtctctecta aaattacaaa aattagccag gcacgatggc 15060 
aggcacctgt aatcccagct acttaggaga ctAAggcagg agaatagc^t gaacctggga . 15120 
gatggaggtt gcagtgtgct gagatcgcgc cActgccctc cagtagagtg agattccgtc 15180 
tcaaaaaaaa aaaaaaagaa gaaatggaga tacaoactta ctacctacct ccttacaaec 15240 
taccetcaca gtattActgt gaataaaagt gtgtgtagca ctgggaacac tattcacaga 15300 
gcactcatga atgtttgttc tttgttatta gttactagag aggcaaatgt ctgccagggc 15360 
tgaataatat gtgtgaattg gtgattgtcg cacatatcta aa^aagtagt tatttttttc 15420 
aattaaaact tagtttaaaa accaatataa ggccgagcgc agtggctcac acctgtaatc 15480 
ccagcacttt gggaggccga ggtgggcaga tcatttgagg tcaggagttc gagactagcc 15540 
tggccaacat ggtgaaaccc tgtctctgct aaaaaaaaaa aaaaagtaca aaaattagcc 15600 . 
aggcatgatg gcaggrtccct gtaatcccag ctaettggga ggccgaggca ggagaattgc 15660 
ttgaacccag gaggtggagg ttgtagtgag ccgagtttgt gccactgcac ttcagcctgg 15720 
gtgacagagg gagacactgt ctcaaaaaaa aaaaaaaaaa accaaaacca atataataaa 15760 
taagtggcca gcaatgaaac agaaagtgaa aagttagtga agcaaaacta gtactgtatt 15840 
cagataaaga tgctgaatct agatttggtc accagaatag ggtcctttgt ggcaacctgg 15900 
gctagtttgg ctgactcace actgccagga tgaaatttet -ttcagtggert actcatttcc 15960 
ctttat-ttta agtccatgct cacagagcaa ccttctgatg cctaattcag cttcctggga 16020 
tacttaataa caggaagggt ctggaagtag tacctgtata ggggatatga gtgttctgat 16080 
tttaatagtc aattcataag tgtacagagg gtttgataaa tggttaggtc agaaccatca 16140 
cagaatgtct acacctcttt ggacattagg aaggtcaaaa acctgaaagg ccaaaagcta 16200 
ggcctagatt agggtcattc accaagaaaa catcagcctt gaagagttct ctgggtggtc 16260 
caccagtcaa ccttcctttg atcacacctc cttcctcgtt gcttctttaa gcattgacct 16320 
gtaatgggta tggaattttt tgctcaccta actccttcct tttacagagg aagaagttga 16380 
agcccagaga gatttaatgg cttgcctaag atcacacgca gattttctgt taaccagggt 16440 
gatttttcag gtgttccctg ccagacgagg gcttttttcc ttgaattgcc tagagatttc 16500 
ttgagatatc cgaagcattt ttcccagtgc agcctggaga aggatgtccc tgtcaacaea 16560 
gcatttgtta ctcaatgtta gacattcaat tttctaatta gtatcatgga gcaacagtgg 16620 
atgattatct ataaggggtt gcaattccat gcttatgtgc ttacagccca tatagacaaa 16680 
tatcagctgt taaaatgaca aggcagtaga gatgtggccc caggacaaag gcatactctg 16740 
ctgttagtga acactagttg gccagcaaat ttcacatggg catatacacg gccaactgta 16800 
gactttaggc atttataccc attcagagag ccaaactggc aactaaagatcagcatt etc 16860 
tttggcattt cagctttgcg ttctgttaaa aatcactgct tgcttaaata cctctgatag 16920 
ctcttcactg cctgtaggca actctttagc etagcagact tggtctttag tgctctgccc 16980 
ctactctctt ccaccattct ggcctcctgt ctaattgctg cccatatgtg ccatgcact^ 17040 
gagcttacag acetgeteag cgttatatga gcataccata ctctttatgc ctcagtgcat 17100 
ttgcacatgt tgttccttca ggccagaatg cctgttactg cctggcaatc agcctattag 17160 
agtctgccaa taccatccca tcttctgtgg aggagccccc cgccaaatcc acccatacct 17220 
ctccccacca atcagagact tcttctctct ttgttattct. cttcgttatt ctcttcatac 17280 
ctcagttata tccatttcag tatttgttta cacatctagc atcactctta gagtgtgaaa 17340 
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ttctccMgt gt99a9cc9t atctagtttg tctttgtatc ccogag^a gcaaagtgcc 17400 
tagaatgtag tgggtgctca gagtgtttgc tgggtgaetg atgtatttgt tgaacgactc ^17460 
tttggacact tgaataaagt ccatccagta tgcaccatta c^tctcttc gctctacaat' 17520 
attcttttag gcaagagctt atcttttgag gtgataagat aagctcaaac ttatgtagac 17580 
toagacctca gtctgtaaat gtcatcccta agtcttaaac catcaaaacc agggcctcaa' ' 17640 ' 
ggaatggcat gccttctgca actgtagcaa cctgctgtgc ttattttgcc gtgtttttca ' 17700 
tttttccccc aaaagctaga gtcccttctc ccatgggcag tgctggaagt gtgctaaeaa ' 17760 ' 
attctttctc catactgctt acgattacaa aaaaaaccct cagcatctca^tgccagactt' 17820 ^ 
gagttaaggt tgttttcttt tgtgtgtcag ctgtattctg gtcatgactt cctgatgatg' itSSO 
ccctatagag attttgctga gatcagaggg tgctccactg ccatcagtag cactgactct " 17940 ' 
tgcagaagca ccgtttctga agttggctaa tgtcatccct cacgtttgtt tgtttgaaat 18000 ' 
ttgttttagt tccagagata gcactttcat ggaatgacgc tatrttctag aatcactttt"' 18060 
tttttttttt tgagttggag tctcgctgtg tcgccaggct ggagtgcagt ggcacaatct ' 18120 
cagctcactg caatctccac cttccgggtt caagtgattc ccctgcctea gcctcccga^ ' 18180 
gagctgttac tacaggcgca cacccccact cctggctaat tttat'gtgtt ttagtagaga 18240 
cggggtttca ccgtgttggc caggatggtc tcgatctcct gactttgtga tctgcctgct 18300 
tcagcctccc aaagtgctgg gattacaggt gtgagtcacc gcgcctggcc tagaatcacc 18360 
; ***ttatacc ataacgtgag caccactgcc gcgtcaccaa ^.ggaaa^agag aggcagctac ' 18420 
tgtggggtta caaatgggta agagtggcac caggaaggtg aaagtcticta cttagccaag 18480 
gcttaacaaa atgtcaatca ccaaacattt atttattaag ctacgttcag gataagaaga 18540 
tgaacaagct atctgtacat tcattttctc gtttgtaaca aggtaatgat agtgatctat ' 18600 
cctgcctgcc tctgagggtt attgtgagaa taaaatgaaa tcaagtggaa'aagcacttag 18660 
gaaaaagaaa agcattggtt ttcaattgtt agtgtggatc agaaacactg gggcttgtt't ' 18720 
aaaatgcaga ttcttagccc cagtctcagc gattctgatt ctgtatatct gaagtgggac 18780 
tcaggaatct tgattttcaa caagctgacc agagggtcca atgctgctat tcctttagtt 16640 
acactttcag aaatattact gtaaatcaaa tggcaagaat aaaatagtta tttgaggeag 18900 
ttttagtatg ttggacctgg agtccaaaga cttgggtcaa actccagctt tgtcagttcc 18960 
tagacctgtg accttaaaca gcaaccttct ctgtgaacct tagtrtccctc^ggaaeggct 19020 
ctggtcaect cctgctgtac tccattgatg actcaccaca taaggctccc 'tgggagtccc 19060 ' 
ccaaaccttt gctctcttaa ctccttttac agcctcctac atctcctgca ggtgctgtct 19140 
tctcctcctt tttccaggcc ctgctctgac acagcattca ttctcctctg ggaagggttc 19200 
cttcaatgtg tctccaagca catcacaccc aggaaggacc ctgtggceat aictgtctat 19260 
eaccagatca aactacgtga aggcaggcac taggtactgt cagtgcccag c'ataggcctg 19320 
gcccatacca ggtgtccaca gatgcetagt aaagaaacct atgatticagg acccceatga ' 19380 ' 
tgagcaacta tagcactaga acagtgataa taactaatgt ttataatgca tcttcagttt 19440 
acagagggct tttgtactca tcatctagtt tagttcctgc aacaacctct tgaggaatat' 19500^ 
agcacaagca ggacaaggga agcccagaga tgttaaataa tttatccaag t'ttatgctgc 19560 
tgggaagggc agcactgaaa ttaaaagaaa agttttctga gctcaaatcc catgcccttt 19620 
cctcaatgtg agctctagca aggtattcag gaatcctgcc tctacagttc agagcctcaa 19680 
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attfctgggt atgttgagtt cttgtatctg atttttctag atttcctgcc cocattctta 19740 
ctgtctggat atcaggaaag agtttatcaa atgcctgtgg aaatccaaga taaggtctca 19800 . , 
. tgatgagtaa ccdagtga'aa acatgaagtc aa^tctaact /agtcactart atttcactac ^ -^^^^ 
tgctgactcc tgatgatcag ctccttttct aagtgrttac tgtccactta ttecatcatc 19^^^ 
tgcctagaat ttatgtgaag gaatcaaagc aaaaggatca <taaggcttcc tttttccagt 19980 
atgtttttcc tcctttttga aaactgggcc agttagctat ctccattttt atttcatgaa .20040 
tacatcccca gcgccbggta tatagtagat atggaacatt acactttgga gatattgcac 20100 
ccattctcca gtttctccaa agttactaac aatggttcca tcactgtgcc aacatatttt 20160 
cttttttcaa tatattggga aataattctc ccagtctgaa- aatctgaaca catttcatgt 20220 
gaettggtat cctcatatgt cttgggcttc caattctcca ttcctagttt caagttcatg ?0280 
aactgtaaaa caaaggatta gactaaatct ctaaagttct atccagatgc caaattcttt .20340 
tctctttcca tgatacctaa gatagatgcc aaatattgtc :ttttacctgg tgtttgtgaa 20400 
catgacatca cattacagga gtagcagata ctaaactctc actctgtaaa acactgactg 20460. 
agttccatga gccagatact gaagtgagct tgttcacata tgttctcatt taatgctcat 2052O. 

aaccctgtga agctgggaat tgctgggaca ttttatttat ttatttattg agacggagtc 20580 

tggctctgtc acctaggctg gtgtgcaatg gcatgatctt ggctcaccgc aacctccgcc 20 640 
^cccgggttc aagcgattct cttgcctcag cctccgcagt agctgggatt acggggcaca 20700 
caccaccac» tccagctaat tttgtatttt tagcagagat ggagtttctc ca^gtt^gcc 20760 
aggitggtca cgaacacttg acctcaagtg atctgcctgc ctcagcctcc caaagtgctg 20820 . . 
ggattacagg catgagccac catgcctgcc cgggaccctt gttttagaag gatgactgct 20880 
gctataatgt agaaagtgat ttggaagagg ggaggagtgg ggcacgaaag atggttagta 20940 
gatgggggtg gtaatgctta cctttcagta tttggaggct tcggagtcct caaaaattct 21000 
cttccttgat tggagtcctc c cage c aata. gagggct tea cacaaacagt ttcttgggtt 2106O 
ttgaattgtt tgaccagage tttcttccga caaaaggttg gggtgattca ttcacttacc 21120 
acaecttgcc tgaacattca cttggggctg ccggttatga aggctattgt tctccagcct 21180 
gtcacagacg ctttgaagac ctgtgcctca gctggttcta aggagtcagt ttgttpagct .21240 
ccgtgccagg tttccaactt atgaaatgtg ctggagatta acacctctcc tgecatttta 21300 
tccctactat aattgccagt caaaggattc ctgcagttgc ctctggcagc cataactgat 21360 
gaatgttctg ccagctgetc tgaggaccta gaagagcagt tttctatcca ggaccagttt 21420 
ccaagggtgg gagggtgaaa tatatcctcc agtgtgacat ttcatctccc. egtgatgggt 21480. . . 
ggcttgggcc ctttgaagtt ggetctgagg aaccacacac" ttgggtctga gcagccagca ■ 21540: 
gcttatcaca tctggtgatc aatccttcaa aggttcctcc tgaagtctga att^t^ggag 21600 
^aaatgga ttccacctgg gaggggcttc tgcttcaact caggacatgg ggagaaggct 21660 

gttcctcttc cagggggagg cagttttcat ggeattgaga tgtcctctca cttat^ccc .21720 

aeccacecac caagtccttt gtaagaggag tagggggaga ggagagcgcc tgcagcctcc 21780^ 
tgctcacatt cctagacace gactcactga gcccgtcgcc gctggaaeag cagagctgtg 21840 
tgaaatgtca agaggagtta tgctcatagg ctccctggec tcagtctctt tgtggcttge 21900 
atattcttcc attagtactg tgtteatcac atggaaatca gagggtacaa ttaaaagata 21960 
atttgctagt cccagactta atttggggcc cccttcttgc ctgattgaat .tacaggggaa 22020 
cataatagat ttttggtgag aaatagttgt ctgtgtggct gggagaaaga ttgctcccag 23080 
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ctctccagct gggcagccct tteagtAtcc cgtAtgttat ttccccactt ccagcccocc 22140 
-.tcacctcctc tgtggccctt'-gtg^gtcccc tcggc^agga tcctgacctc ct9(^ 22200 ,*•- ' 

gtttaaactc' aaottgagac ccaaggaaaa -tagagagccc tctgcaacct cataggggtg 22260 ' ■ 

aaaaatgttg atgctgggag ctatrttagag acctaaccaa ggccbagaca gagagagtga 22320 
cttgctaaag gccacatagc tagcccacag tagttgtaac aatagtctta atgatattaa -22380 
tggctaacat ' ttatcaacct ttaatgtgtc ccagactttg tgccaagggc ttacatgcag 22440 
tgcattgtcg cattcaaacc cagacagtct ggctctgggc ccaggctgag ctttggtata 22500 " ' 
gcatggtaga acgttgtcta taatgtetag tctgggttca aatcctgget tca'cttctca 22560 
catttacagc tgagtgacct caggcaagtg atttaacctc cctgtacctc agttgcttta "22630 
. tctgtaaaga gaaoaatcac agcactgtgg aatagtgggg'gttaaaattc attcataeaa 22680 
gtagtgctgc aagcaatgtt taatacaggg tgagcacctg ttcagigctt ccttcttctg 22740 
gctgcctctg gggetagagt 9tggtgtctt cgtggtatag atagatagat atggctgagc ' 22800 
tctgcaca4ui caccaagagc 'tgttcttcac tattagaggt agtaaacaga gtggttgagc 22B60 ; ■ 
tctgtggttc -tagaacagag gccggcaagc tatggcccat tgcctatttt aatacggcct 22920 
gtgattgatt gatttttttt ttctttttga gacagagttt cactcttgtt gcccaggctg 22980 
gaatgcaatg gcacgaactc agctcaccgc aacctctgcc tcctigggttc aagcgattct 23040 
cctgtctcag cctctcgagt agctgggatt acaggcatgt gccaccacgc ctggctaatt 23100 
tttgtatttt tagtagagac agggtttcte catgttggtc aggctdgtct'cgaacttcca 23160. 
• acctcaggtg atctgcccgc' ctcagccttc caaagtgctg ggattacagg 'cgtgagccac '23220 * . . 
catgactggc ctgattgact gattttttta gtagagatag ggtcttggtt tgttacccag 23280 
gctggtctca aacttctggc ttcaagcagt cctccctcct tggcctctcg aatgctggga 23340 
ttataggcat gagccactat gcctggccta tatgacctgt gatttttaat ggttagggga 23400 
aaaaaagcaa aagaatgctt tgtgacatgt ggaaattaca tgaaactcaa atatcagtgt 23460 
cccagcctgg gcaacaaagt gagaccctgt ctctacaaaa aataaaaaaa aataagccag 23520 
ggccgggcgc agtggctcac acctataatc 'tcagcacttt gggaggecga ggcaagtgga 23580 
tcacctgagg tcaggagttc aagaccagcc tgaceaatat ggtgaaaccc tgtctgtact 23640 
aaaaacacaa aaattagccg agcatggtgg catgcgcctg tagtcccagc tacttgggag 23700 
gctgagacaa gagaattgct tgaacctggg aggcggaggt tgcagtgagc caagatcgcg 23760 
acactacact gcagcctggg caacagagcg agactccgac acacgcacgc acgcacacac 23820 
acacacacac acacacacac acgctgggta tggtggccag cacgtgtggt cccaggatgc 23880 , - ' 
actggaggct taggtaggag g&tcacttga gcttaggigg itgagactac.aatgaaccat 23940 
^^tatacca ctgcacttta gccagggcaa cagtgtgaga etgaatctca aaagaaaaaa -24000 ' 
aaaaaaaaga aaaaaatctt tccataagta aatatctgtt ggaacatagc catgtccctt '24060 ' - -. ' - 
agtttatgtt ttatatatgg ctgcttttgc cctataatga cacaattgag tggceacgac '24120 ' - ■ . 
. agtctgtatg gcctgcagag cctaagatat ttgctctctg gcectttaca 'gaaaaagtgc 24180 " - - ' . 
cttgacctgt gctctagagc catatgtacc ag^ttgaaa ctcagcctca cagctgggtg 24240 
tgatggcacg catctgtagt cccagctact ctggaggctg' aggtgagagg atcacttgag 24300 
tccagaaggt cgaggtcaag attgtagtga gccatga^g catcaccgca ctceagcctg 24360 
agtgacagag agagaccctg aetcaaeuiaa aaaaaaacaa 'aaaaaaaaaa eaccctcacc 24430 
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ACttatcagc tatttgtctt gagaatagtg acataacccc tcagaaccta tttcctaatc 24480 
- tgttaaatga ggctgatgac gtttcctcct tttactggca atttaaacat gatggataat^ 24540 
'aaatgctkag cacttaacac agggcctaga agatattaac. igct^aat^a atggtagctt 24'6op . 
cttaacagta ttcaaaccca tgtgctctta tcacatgcat,tgttgtccct gtgtccagtt 24660 
ggtggaatgg gaaaaggctc ccttgtaacc ccatctacca . tctittatcag actttcctgc 24720 
catggttcac agtaagagat agaagctgca cggtgacttc tggctcttta coatggtgag 24780 
cggtgtgtgc ctggtaaggg agagctga'tg tcactgcccc aaatccagta gtgagatctg 24840 
agtgttctgg tttcctccag cagccttgct ttttccttta caatcctgca ggcagggaga 24900 
caagggettt ctacatggta ggctctggtt tggtcatcgt cacaactggg ggctgttcag 24960 
gtgggctccc attccagata cctaggctta tcaatccctt ttggcacccc agg9cttt^t 25020^ 
cteectcatg ccccattttt cagtttgaaa agcatggtta tcacaggaca agtaga^gaa 25080 
gctccactgt ccactgaggc caatggatgg tgttctgcat gtgaacactc agtgaatAgt 25140 
gagtgaatga gagtaacctg ggctccatcc tatttgcaga gagctttgga aaagattttt 25200 . , 

ctccrttaaag agccagaatg aagcctggta gtgggagagc t<M:agctcta gagtcacatg 25260 
agcctaeatt taaattccag ccctgccact gactcccttt ttgaccttga gtgagttacc 25320 
taatctctct gtacctcact tttcttgtct gtagagtggg aataattcct gtctcagaga; 25380 
aataaaagag tgcatatagt gtttgccaca tggagacaca "tcaggtgtag gttaatactc 25440 
tgggccttgt ttccttattt gcaacacagc cctgccctgg agtggaagtg gcacctccca 25500 
ttggtcagct cttgaggctg tccccaggac aggcagaggg agggaatgaa tgggagccct • .25560 
agtgccagga cagaacagat ggcagctcag agctaggatg gctctctgga cctgtctctc 25620 _ 
ctaccagagg tccccccgtc tggtgtggct cttcctggac ctggcatcct ctgctttttt 25680 
tttttttcca cctccaagca gaattactgt cctgtaggca gctcctctgc ttgaggacat 25740 
ctggggccag atatgttcac actctatcct gccttgccct tccctgagct caggatggac 25800 

gctcaattgg tcccagttat tgtctgcagc gcctgcctgc agcctcgatc cagcccagct 25860 

ccaccccttg cctgcaaggt ctgtttccta acagctgctc caaccacaca cctcggttct 25920 
gcgggagccc ctcctcttcc tccctccctc cctcattcag gggtgggact gaagaagaag 25980 
gctaacttga cagcagcgct tctttcttag ctagtcaccg gcccctgctc aagaatgcca 26040 
gtgtgtgtgt agcctccaca gagaggtcgt tttctcggag tccagagggg ccgcctgagc 26100 

ttctgagaac tagggaggag ccatcccagc catgagcccc -tgtgggaatc tgctgggggc 26160 

caagtggcct ggagtcctca ggctcccgca gctgctc<:gg agggagaggt gagctcaggg. ; 26220* 
cagcctgcct gcagccagag gtgccgggag ccccgggcct .gtcatggtgg ccatctacag i^6280 . . .. 
ccggcctgag gcagtcacag acggatttgc agctgagcct gtctatctgg tgtgggaaga 26340 
agatggggag ttacttgtca gtcccggctt acttcacctc ,cagagacctg tttcggtgag 26400 
ttggtctccg agttcccctc tccatctctc ctggcccctg gtcctgagag gagggtggtc 26460 
tccc^aatc tccttetcac ttagtccttt accatcggtt ctgccgggca gaagccagcg ,26520 . ^ . 
gaggttatac ccaaggagaa tcggccttgt gaggtacccc cattatgtcc tggaagtggt^ 26580 
gaggggaggg atatacccag aaggaacttc ttagggagct ccagctcccc ttctatccca 26640 
gacaaacctg aaggagcctc caaaagatgc cactgacctg cccattgtag atgttactgc 26700 
ttccgggggg aatagcccaa atagagtgct gtttccagct ctcacatgtc ttacctgcgg 26760 
gccatgctgc ctgcccagga atttgte^ca acaagcagga tgggcaggtt ttgccaaact 26820 
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gtggaaactg gcaagtcctg ggtgtg^gta gcctggtaca cagtaggcac cttataaacg 26880 

tttgttctct taatggcagg cacatttgcc. tct^ccttg-aagggcttrt " '-" 

" gtgaatgtag t'tgctgggga aagacctggg cgagtgcttc'taagactgga' gcaatggg^ • ; 

ttagagtgtt cctgagctgc tgggccagcc cccacacctc ctcagtccct aggcctaagt' 27060 

acctccacga gcctctctct gtggggcttc tcagagggag atgtggaaa'c tctaectcta'' 27120 

acctggcttt ctttgctcat tgccccacbc cacetcccat agaaactccc cagggggttt 27180 

ctggccctct gggtcccttc tgaatggagc cattccaggc liagggtgggg tttgttttca' 27240 
.ttctttggga gcagcctgtt gttccaaaaa ggctgcctcc ecctcaccag tggtcctggt ' 27300 
cgacttttcc cttctggctt ctctaagcta ggtccagtgc ccagatcttg ctgccgggat' 27360 ^ 
actagtcagg tggccaggcc ctgggcagM aagcagtgt^ ccatgtggtt ttgtggaatg 27430 
accggaecct ggtagattgc tgggaagtgt ctggacaggg ggaaggggga agggaactgg 27480 
tcctcaatgc tgactctacc aagcgccctg ctagacactt tatcctttaa tctctcaaca 27540 
gcctaaagag attatatatc cccattttac agatgaggca accagtttca acagagttaa '27600 
catatggagc ctcactgggc agctttttct gtcttcctga ctttctctca tocttcaggg 27660 
ggctgcaggt ttgttttctt ctcctagtgg agaggaaatt ctcaggtttg ttttcctctc 27720 
ctagcagaga gtaaaaaaag ggatagtttg cctgacttgt tgaaggtgtg gctgagattg 27780 
ttttctaaag agccaatgga aattgatctt gagtttagga gaaagctttt acaigtggaa 27840 
ttaagatgcc ■ aagtgttgaa gtagccacat ttcaggtccf cattaatttc tcrttaatcct 27900 

gggaaggcag cttaggagaa gggttgttcc tttaggagcc aggaactata ccccttttac V „ • 

ccttggagag gcagggaagc cagggaggac acaacttctc aggaagagga gaagctagag 28020 
cagatagtga actctcaacc tgaaccttta agggccagac cactaatgcc acccaagtcc 28080 
acctgccgtt tgtcttgttc tgtcccaggc tttctggaga acctgatctt cttgccccta' 28140 
cccccaagct ccgtttgccc agctagagtc tggggggtac tgactgactt tcgtagacat 28200 

tettcccttc cccaaataag aggccacatt cctgaagtca cttctgaaga' gatagctgcc 28260 -c 
acacagggct ctttcccccc agggagggac cacccagacc ctctgctctc ccaggtatcc 28320 
gttaccacat cactacctgg tcagaaagct gtttctgcca ttagcccctc cctcttttat ' 28380 
tataggatat cctcaagggc tcctctttgg gcctcagttt catccttggc agaaagtaga 28440 
agctagactt cttgggctcc tgaacagggt ccttgctgga ttctgtgaaa caaattaagt 28300 

tcttgaccct aggcctctgg gggagtacaa agtctatggg agttctgggg'ctgtggttgc ~26Si50 . ' 

aaggaaagtg acgcaaccag attccatggg gacatgatca 'g'?^?^^^^^ 9tgagggagg 28620 .' -. 'V- . 

aaga'gggagc aagggaatga agaatacaac ttctgtgtcc' eatacacccc tgcctgaeag. '28680 . 

gccatacata ctcagcagag aatgcaetgt ctttcctacc acactagcgt'gaggagtgag' 28740 - ■ ■ 

ctgcaattac cactgtgctt ccaagtaaga aaa'tacctca aattggaatt' tacaaoagag ' 26800 

9taaattagg gagtggcttt tgtcggacat ctttaaagca' tt^ttctttt tat^gaattt ' 388 ' - ■ 

cacttaatgt ccaatactga tttaatgagc ttgggtttac acattatctc ttgaagaaaa ' 28920 

caaatgaacc tttgtgttcc aaagcaatcc atgtttaaag ggaaaaaatt atgcataact ~ 28980 • . • • 

ctgcccagct tcacagtaac crtttggcagg tgccttaggt cctctgggac 'tcttttcctt ' 29040 • 

atctgaaaaa tgaaggactt ggatcaggtg aatggttccc agctctgcaa cttatgtggc ' 29100 

tcctcagagg cacacaagct cttttccatt atttgccaaa'taatggaggc ectgtettta' >29liS0 • " 
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act^cagtac aactacacaa aatacttgaa actAca^ct tcctggftttt tggttggaaie 29220 
tgaatcagtg cactctagca. acacttattt cttgctgttc gtaggcttca ttatgtgttt . 29260.. 
ggttaatttt ttaaaacaac aataaeatat tccataataa ttacagctta attggcagao '. 29340 
tgtttcagtc tataggatct gcaggaagga ggagtaataa agggattttt gactgagctc 29400 ^ 
ttatggaaca gagtctctct aggcccctgt catatctgcc cttctgggcc ctggggaaaa .29460 
gttggcatcc ccagttgtgg tgctctccag gtgccctcag gctgtggtgg agggagettc 29520 
ccattctctc cttcagccca ctcaatteag aggetagggg ctgaaagaag cttctctaca .29580. 
actggctgtt cactgggagg ttaagggatg accatccagc caggccttcc tcaggacatg 29.640 
ggagggctta tgctttaaca tgtgtaaatc cactgcaata atgactggt^ cttttacccc 39700 
ataaggttga gaatttacct gtaaacattt ttgtctgaag aatttggatg taagtgaggg 29760 
ctgggcctct atcttatctc acttggcttc tctcagcaca gcaccttgcc tgcttgttet 29820, 
taeacatcct agatgcacag taactatttc ctaattatte gaaatctatt agaatcaatt 29880 
gatttcagct gggcttggtg gctccttcct gtaatcccag cactttggga ggctaaggct 29940 
ggaggatcac ctgagtccag gagtttaaga ccagcctggg caacataggg agaccctgtc 30000 , 
tctacaaaaa ataaaaaatt agccaggcat ggtggtgtgc acctgtagtc ccagetactc 30060 
aggaggctga ggcaggagga tctcttgagc ctgggaggtc agactacagt gagcaatgat 30120 
tgtgccactg cactccagcc tgggtgacag agtaagactc -tgtctct'taa aaaaaaaaaa 30180 
aaaaaagttg atttctattt. gga;tagataa ataattcatt ttaggacctt tetttttc^c 30240 , 
. ttacagaaat ctgtttcatt ctgggctgag aagcaggtcc atattgctag gcataggaga 30300 ^ 
aaaaggggtc tgtctgcatt tgcccttggt ggtctcaaat tggggaggga aagaaatgaa .30360 
cacttactgg ctacct-bctg tgagccaggc atca-tgcaag acatctgtac ataatttaat .30420 
tctcataacc ccataagata ttattagcaa tgtacaagtg aggaaactga ggctcagagt .30480 . 
catgaagtaa ctggccttgg gtgacacaga tggtaaatgg cagagaagga atatggatcc 30540. 
aggtcttgaa agagaaaatc tcaactgatt atctttttta aaaaactcat atgttctctg 30600 
ctgactcaaa aggtctctgt gtggatctgg gttgacccac tgaaetgacc atcagggttc 30660 . 
catgcacttt gtatctgccc aagccetcag aacccctcag taatgttttg gaagatgagt -30720 , 
tttggagg^t: gtccttaggc atagcctcag cgtatgtagg cctctaggtg atctccccta 30780 
acctgaggat ttcagctcaa ttcactctgg ctectcagga eagtgggatg actggttcag 30840 
acctcagctt taccacctcc cagctgggta ctcttctacc tacagccagg gcagattttg 30900 . . 
aetttcactt gaaacttcca aaaattgaaa ggtagaaaaa cagcettggc . tttgggaaga 30960 
acgtatgatg ^ccatggcct ctaagcatct gaggtgggac atgttcgagt agcaccttac ^31020 
agttccaaag tgtgttctgg gttctttgtt taaaagaaca gagactgctg gggaattgaa 31080 
cactgtgaag tatatgaagg aggagaattg tgctatttaa catteagtac ttgggctaaa 31140 
ggagaagcat cacgaagtgt taacactcaa agggtcttga gctgtcaggg etceagettc 31200 
cttattttca caggtgagaa tcctgaggct cagctgttga gatgtgctgt eteactccgg 31260 
tgacatagta cagtggatgt ggctttgcag ccaagcacac atagcttcac attccagctc -31320 . 
catcaattat gtattgggca gctttgcaga atgatttgac tttaactctg cttttcagtc - 31380 
ttctgtaaaa cagggataat cctgctaccg tagggttgtc aggattagag ataa^taaa 31440 
taaggtacct catataggac ctggattatg gctggcattc aataaatagt .agctgttaaf* 31500 
tgatagetaa gctagaactc tgaagtct^c catggcaact tcttaagtgg tctgagaacc 31560 
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cagtt^tgtt ctgtggcaaa «cacagctta gggatccata cccagccctc ctgtcagctg 31620 
ttcaccttcc agttcttcag. agacatgtgt ggcagtgact' ttggccacat Vgctggctgt ' 31680 
gccctttaaa ggcattcctt gacacagata tgtggactgg tgacgttgct ctccagecag 31740 ^ 
gtgttcttcc cagcaggctg gcctggctgt ctcctgcatg cetgtacttg tttgtctccc 31800 
tgctccctct cctgggcctg gccagagcta cttgcagcaa ecaaaagcag gatattggca 31860 
Atggaaagga gggtgtgttc tggtgctccc atgccctgcg gcgcacatac cattgcaagg 31920 
gcgtaacaga gcccaggcct gcatttgggt gcaaataagt ctgcacacag aagaaaagaa ;31980 
ggacctggtg accaggagcc atggaaccct tgtgctcccc tacctgggct actggttctt -32040 
gccactccta ccattttcag tttggaaata tttgttaagg ctttgctctt 'ccaggtcctt 32100 
tgcttggtgc tgagtctacc aagagtaagt gggatgctgt ttttgtcctc agggagctaa 32160 
cagtctagtg aagaagaaag atggttgccc aggoacttct aagtcagaag gcaggaggca 32220 
agaaggaagc ccctgctcct actgccagcc ctctgttggg cacccca^g ttcttcagaa 32280 
ccacatttaa tcctcactgc aggccaggca tagtggctca cacctgtaat cgcagcactt '32340 . 
cgggaggcca aggcgggcag atcacttgag gtcgggagtt cgagaccagc ctcaccaaca 32400 
tggggaaacc ccgtctctac taaaaataga aaaattagcc 9ggtgtggtg gcatgcgcca 32460 
gtaatcccag ctactcagga ggctgaggtg ggaaaatcac ^tgaactcgg gaagcagagg 32520 
ttgcagtgag ccgagattgt gccactgcac tccagcctgg' gcgataagag caaaattcca - 32S80 
tctcaaaaaa aaa'aagaaaa aagaaaaaat cctcactgct 'accttgaaag taggtgatga 32640 
cattgccatt tcacaaatga gaagtgaagg ggctagccca' agatcactta .ggtggtaaat 32700 
ggtggtgcta agattagaac ctcagatcat ctagggaaaa acacagatat gcacagagtt 32760 
aaggggaccc agggtattgt ttgtcctctt gtttcacagg tggggaaaca acccagagag 32820 
ggaaaggggc ttgtccaagg caatttagca cccaagaact tgaacccata tctctctcct 32880 
cctcatttag agctcatccc acatgtatct tatattgaga ggagtgtgag ccacatacca 32940 
agaacagtct tcccctctgc ctccaacctc actgtgcagt tttgagacac ttcacagcca 33000 
taetcttcat gccataccca gcccttaaga ccetgaagtt ecccttccat aagacaagta 33060 
ggaaaagcta tagggtaaaa atagccatca gtgtttgttg agcacccagg aggaattggg 33120 
cactccagaa agataaaggg attctcaggg acttgcttct ctagacttcc ctagctcagc 33180 
tgcttcaact cattcctgcc cctcttctct acctcccgca gtgctcagaa gtagtagaac 33240 
tcactgtggc ctctcacctt gcattgttga gttttattta gactttctct tcctcaactc 33300 
ttcataagct catgaaaggt gaagtagggt gccctgtgta tttatctttt atatctgcag 33360 
t'gcttagcaa gttataataa tgcacttgcc tggtiaaaagg 'ctttctctca tacattagct ' 33420 
tatttcctct tcacattggc tctttgtagt aataggatgc tattagttat tttcaatgag 33480 
agaaagctac taagagaagt tgtccagcta gtgacagtaa gtggctgata aagtgagctg 33540 
ccattacatt gtcatcatct ttaatagaag ttaaeacata ctgagtttct actatattgg 33600 
gtcttttttt tttttttttt ttttttttta gagaeggaat cttgctctgt tgtccaggct 33660 
ggaacgcagt ggtgcaattt tgggteacca caacctccgc ttcccaggtt 'caagcgattc 33720 
tcctgcctca gcctcctgag tagctgggac taccagtgca cgccaccacg cccggctaat 33780 
ttttgtattt ttagtagaga cagggtttca ccat^ttggc caggctggtc ttgaactcct .33840 
gaccttgtga tctgcccgcc tcagcctccc aaagtgctgg gattacaggt gtgagccacc 33900 
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ge^ccctgcc tatattogga cttttatata egctAtctct Agctagctag ctAjc^gct 33960 

- ataatgtttt ttgagacaga gtctgactct gtcacccagg ctggagtgpa gtggcgtgat .34020 

ctcgactcac tgcaacctcc acctcctggg ttccagtgit tiert-cctgcct .cftgccicccg 34080 . 

agtagctggg attataggtg catgccacca cgcccagcta attttttgta tttttagtag 34140 

eccaggtttc accatgttgg ccaggctggt ctcgaactcc tgacttcaag tgatccaccc 34200 

gcctcggcct cccaaagtgc tgggattata agcataagcc actgtgccca gctgctetct 34260 

atatttttaa tacatattat ttccattaat tttcacagca gttcatttta tagatgagga 34320 
aactaggcca gagaagtaaa otatcttgcc caagatgatg taactagtaa gtggcaggat, 34380 

caagattcaa accaagcaat gttcaaacct cttggaagca agaatgtggc cactgtggaa 34440 

ggtgcaaggc cttgacaaca agaataggga aaagaaggaa ctagaaggaa agagatggca 34500 

tgggctcagc aggccaggga gctcttagct gtgtgtgttg ggaagctcsag aagggaggaa 345«0 

gaggttgtct gtgcaggtaa gtcctgagaa cacaccagac ttttgagagg tggagcttca 34620 

tagccaggtc attaggggag aagggagcta tagatttttt tttttttttt tttttttttt 34680 

t^tttttag agacggggtc ttactatgtt gcccaggctg gtcttgaact cctgggctca 34740 

agtgatcctc ccacctcagc ctcccaaagt gctgggatta gaggcatcag ccaccccgcc 34800 

cagcgagcta tggatctaac atgtacatct tacacagtgc taatagaatg ttgggtttct 34860 

tccccaatat tttattttga aaaaaaattc aaatatatag aaaagttgaa aaatgtagtt. 34920 

caaagaacac ctacatacct ttcacataga ttcatgattt gttaatgtta tgccactttg 34980 

tatatatctc tctccctcct atctgtatac ttttattta't ttattt^^ .^.5040 

cagagtaact taaaggcatc ttgattttac ccttgaacag ttcaatatgt ttctgctaag 35100 

aattctccta tataagtcag atatcattac atctaagaaa attcacggca attttacaat 35160 

ataatattat agtccaaatc catatttcct cagttgttcc aaaaaatgtt catggctgtt 35220 

tcctttttta atctaaattt gaatccaagt ttgaggcatt gtatttggtt gctgtgtctc 35280 

tagggttttt aaaatctgtg ccttttcttc tccccatgae tttttagaag agtcaagacc 35340 

ggttattctt atagaataac ccacattcta gatttgcctg attagttttt ttatacttaa 35400 

cgtatttttg gcaagaacat tacattggta acgctgttgg tgatgggtca gttttgaaga .35460 

gtggagatga ttaaactgct tttgttcatt gaagtatctg tcaagaccag agatccttaa 35520 

ctggtgccat aaataggttt cagagaatcc tttatatata caccctgtcc cccacctaaa 35580 

ttatatacac atcttcttta tatattcatt tttctagggg aggcttcttg gcttttatca 35640 

aattctcaga gggccccaag acccaaagag gttatgaaac.actagtctgt ccactgaggc. 35700. 

: aggcaacaca} gagctggttt ' ctggggcctt gttcagtctg .aacMgcttc < '^^^^^ 

atagcacaag gctgtaactt tgccccatct tggctttgga tcaaagagga ctgtccattt , 35820 

tgttgtcata cctaggaacc agggacagct tatgtggcct ggttccaggg atccaggaga 35880 

atttcagttc ttgtcttgcc tttcaggtgt tcagaatgcc aggattccct .caccaactgg , 35940 

tactatgaga aggatgggaa gctctactgc cccaaggact actgggggaa gtttggggag 36000 

ttctgtcatg ggtgctccct gctgatgaca gggcctttta tggtgagtga atcccttcat 36060 

atctgcccct cttggtcttc agagtccatt gacagtgctt ccagttccct gtggcctgtt 36120 

aatc^ttag tctttccatc agccagggca tctcccttta tttattcatt cattcaacta 36180 

gcaggtatca attgagcacc tactaagtga aaggtaagat ccttccctca .aagacttaat .36240 

agttgaacgt tgggagtggg aggagaggca ggcagagagg agacacaata tagttggata 36300 
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aggacctcca oggagagtgt tacaggctga gaggaggata ^cttaggtt gtctttaggg 36360 
aatcagaa'aa ggagactctg gaatag'gertg gcagagagag gggct'acctc ctatacctgc 36420 ■ 
' tctggacaaa cgactttaag catagtgaca.gatttgccaa c'cctgtattg gaagaactga -36480 
tcttttttag tggggatgat tacttctggg gatttcttct cataactgag aceaaaacag 36540 
ttttgtgcag tctcagaaat gacaggaggt acc'aatctga cacttccttt ggaagctcta 36600 
gggcagagag tgaaagagtg gattttgacg ggggccttgc ttggaggtca ttcacccacc 36660 
cctgtcctca ctccagcaac agtgataact cacttccttc ctecctttgt acacccttct 36720 
.ccccacctgc tcacaggtgg ctggggagtt caagtaccac ccagagtgct ttgcctgtat '36780 • 
gagetgcaag gtgatcattg aggatgggga tgca^tgca ctggtgeagc atgccaccct 36840 
ctactggtaa gatagtggtc ctttgtctat cctctcccat ataagagtgg ctggcgggga 36900 
gggacagtgg cagggtgagt tgggcagaag gagtgttagg gtagtcagag cattggattc 36960 • 
ttaccacagc agtgc^ctta accagctctt taacttgtaa gcagaatgat ttacacatgt 37020 
ctctaccctt tttccttacc aaccttgaaa atgtcttcac tctgccctgc aatcctccca 37080 
gtgggaggca ctcttcaagg acgatcccag aacattaaag tcaaagaccc cttagagctc 37140 
accctgtcca accaccttgg ttgataaaag aagtcagcct ggggcccatg gaatagaata 37200 
gtacaagggc aaggttctca ttgtgagtca aaggtagagt gaagagaacc cagaccatct 37260 
caccccaacc caggccagtg tttttccaaa tataccactt gctgcagatc tagctcagca 37320 
cccccagtcc cagcccaccc tgagaaccca ggctcctcat tctgagcagc cagctagaat "37380 
catgacaaag agggtgc^ag tgagactatg ggtactgttg'cttaaagcca catggtgcag .37440 
tggttgctgg ggggcttctg tgtgggactc tagcatctta ttcccccctg tgccctctcc 37500 
ccagtggg4Ui gtgccacaat gaggtggtgc tggcacccat gtttgagaga ctctccacag 37560 
agtctgttca ggagcagctg ccctactctg tcacgctcat ctccatgccg gccaccactg 37620 
aaggcaggcg gggcttctcc gtgtccgtgg agagtgcctg ctccaactac gccaccactg 37680 
tgcaagtgaa agagtaagta ttttgagaac ccttcagcag gggttcttga gcagagtctg 37740 
taaatgggcc tcagagggct tagacctcca aagtctcatg cagaaetccc tttattctca 37800 
tctcatatct.ttctcctgga ccccactatg ctgtaaccgt acctgggcct tggcacttac 37860 
tgttctctct gcccaggcta cttcctaccc gatact:taag gcaagaatca ctcacctttc 37920 
eggtgtcagg tttcaggtca tgtttgctct ttgaaatcat ctggcttgat tatgtgtatt 37980 
agttgtttat cttctatccc ctccactaga atgtaaattc cagaagaaac ttgctgtctt 38040 
attcagtgct . gcatgcccag ggcttggaag agtac'ctggc atatagtagg 'agttgattga . 38100 
ttattatttt gtcagtcgag agaatgaatg gagaaaatgt ggtccatggc ccaaaagaag' 38160 
ttaagaccct atcctagatt caggccagag accagatgga gaaagagtct gtgtctatct 38220 
oataccagta atgtcgtacc tctggccgct taccatgtaa atattgattg tgtatctacc 38280 
atgtgttgga cactaggcta gtgcttgcac agcaggtgaa agatactaga gtttgggaag 38340 
tcaggaggag ctaaggtctg ttctacaacc ttattagatg aagaggagag ggaattgtgt 38400 
tcagggcaga gggagaagca tttctccaaa agtaggagtc ttaatcatgt ctgatgtagg 38460 - 
ttgagtgtgg ccagaaaagg ggctgttaag tatagagggc ctggattatg aaaatccagc 38520 
agatccattg agagtttaag cagcaaggtg ttgtgaccaa gttaacattt tagaaggatc 38580 
actggtatgg aggttggatt ggagagggga aagcctaaag gtatagagac tagttaggaa 38640 
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9Ctatt9ta9 gctgggcatg gtggttcatg cctgtaatct cagcactttg ggaggctgag 38700 
gtgggaggat tgcttgaggc- caggagttga agaccaacct ggccaacata gcaagacccc . 38760 
gtctctgttt .ttcttaatta aaagaaaagt ccagacgtag acatagtggc.^cacgcctgt 38820 
aatgccagca ctttgggagg ccaaggtggg cagattgctt gaggtcaaga gtttgggatt 38880 
aggccaggcg cagtggctca cgcctgtaat cccagcactt tgggaggccg aggtgggegg . 38940 
atcacaaggt caggagatca agaccatcct ggctaacaca atgaaacccc gtctctacta 39000 
aaagtacaaa aattagccgg gcatggtggc ggacgcctgt agtcccagct ac^cgggagg 39060 
ctgaggcagg agaatggcgt gaacctagga ggcggagctt gctgtgagca gagatcacgc 39120 
cactgcactc cagcctgagc gacagagcga gactcca^ct caaaaaaaaa aaagagttt.g 39180 
ggattagcct ggccaacatg gcaaaacccc a tctctacaa aaagtacaaa aaaattagct 39240^ 
gggtatggtg gtgcgcgcct gtaatcccag ttactcagga ggctgaggca tgagaattgc ^39300 
ttgagcctgg gaggtggagg ttgcagtgag cccagatcat gccactgcac tccagcctgg 39360 
atgacagagt aagatgccat ctcaaataaa aattaaaaac aaagtttaaa aaaaaaatag 39420 
aagctattac cgtgatccag gtaagagatg tgaataacta caatgatgga aagaaggcag 39480 
agttcttaga gatgggagta ggagagatga gggaactcca gattgggaag atgatgttca ,39540 
agtttctggc ttaggccaca gggtgagtgg caattccctt cactgagatg gggcatcctg ^39600 
gaaaaggtgt tgcctttctg tgtgggtatc ctgggcccct taggggccac tggtggcctg 39660 
ggacctggta aaccttccct gcacaagcag aattggtcaa gcaggttttt aggacatctt 39720 
taccctgcct caactcttgt ctggcccagg gtcaaccgga tgcacatcag tcccaacaat 3978Q 
cgaaacgcca tccaccctgg ggaccgcatc ctggagatca atgggacccc cgtccgcaca 39840 
cttcgagtgg aggaggtaga gtgtgtgtct aatctgtctt gtgagggtgg gacatggaac 3990o 
agatcctctg ggaaatcagg ctgtagcctt taccttttcc tacccccagc ccatctcttt 39960 
gtcttagcat tgagcctgtg accactggtg acctatttca gcgtaacagg ttcccagggt 40020 . 
agcagggatg gttgatggac gggagagctg acaggatgcc aggcagaggg cactgtgagg 40080 
ccactggcag ctaaaggcca ccattagaca agttgagcac tggccacact gtgcctgagt 40140 
catctgggtt ggccatgggt ggcctgggat ggggcagcct gtgggagctt tatactgctc 40200 
ttggccacag gtggaggatg caattagcca gacgagccag acacttcagc tgttgattga 40260 
acatgacccc gtctcccaac gcctggacca gctgcggctg gaggcccggc tcgctcctca ^ 40320 
catgcagaat gccggacacc cccacgccct cagcaccctg gacaccaagg agaatctgga 40380 
ggggacactg ag'gagacgtt ccctaaggtg ccacctccca ccctggctct gttctgtcct 40440 
atgtctgtct ctcggatgaa gctgagctgg cttteagaag cctgcagagt' taggaaagga ; 40500 ' . 
accagctggc cagggacaga ctatgaggat tgtgctgacc cagctgcccc tgtggggatc ^40560 
acagtttaca gccagagcct gtgcggaccc agetgtctgc caggtttcet tagaaacctg 40620 
agagtcagtc tctgtccact gaactcctaa gctggacagg aggcagtgat gctaaaccct _ 40680 
gaagggceuic atggcctatg gagaaagcat ggagcteaga . gcctggagta cgggcacaga 40740 
taggattgaa taaattgtgt agaaagactt tgaaaacaat aaagcaaaag atgaatgaac 40800 ^ 
gt^tttttta gacttgaggg accaacaacc cccaaacccc agattctgcc aggtccatgg 40860 
ggaaggagaa gttgccttga gtggaagccc caagtaggga gacttacaga aaagaagtca 40920 
agagcactgg ctcccaggca gaaatactga taccctactg gggcttcagg ctgagetcct 40?80_ 
cccttcacaa atcacttcat ctctctgagc ct^tttctgc atctgtgaca taagatggta 41040 
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agataaaggt ggctgtctCA ccaattatgt aaggattaaa tgtggaaaag gacataaagt 41100 
tgtatagtgc tgccataggg -acagtgttca gtaaacgtga cacattetta gtatcactaa 41160 - - ' - 
..gaatcaggtt cttggccagg caccgtggct catgcctgta atcccaacac tctgggaggc 41220 - 
ctaggtcgga ggatggcttg aacacaggag tttgagacca gcctgagcaa catagtgaga ' 41280 - 
coctgtctct acaaaaaaaa aataataata ataattgttt ttaattagat gggcagggca 41340 
ctgtggctca cacctgtaat cccagcactt tgggaggcca aggccggagg attgcttgag 41400 
gccaggagtt caggagcagc ctgggccaca ttcctgtctc tacaaagaat aaaaaagtta 414<0 
actgggcatg gtggcacatg cctgtaatcc cagctactca agaggctgag gaggaggatt ' '41920 
. gcctgagccc aggagttcaa gactgcagtg agccttgatc acaccactgt act acagctt 41580 ' . ^ : 

gggcaacaga gtgagacctt gtctccaaaa aaaaaagttt gttttttttt atccactctc 41640 - 
ctcaccaaac aaactgagta agttagagcc ctetcagctg gcatgtgttg gaaacagtgc 41700 
cctctcatta aagtgctgcc ctcactccca ttgcctcttg gccttggtca gtatgatgaa ' 41760 - 
attaglggga ggcagggcaa cagagggcag ggaagagcta gaaatccatg gcc-tggaaaa 41820' 
gggaagattt gggagtggcc aggtatctgt agagccacca tgcagaggag gggggcagct . 41080 
agccttgtgt gctctggtgg gcatggtcag caggaggcag agcaaaagga caagggtaag 41940 '1 
•^aeuicctgta ggtcgggaca agccaagagc catccagcgt cagtcctctc tgggtagccc 42000 
aag^aaagca ggagcatacc ccagagagaa agttcgcagg gctgttcacc' tgcag^'ctg 42060' 
tggacttcaa. ccttcttgtt ccttcttcag taagtgaaaa taacagtcat tgaccatgac . 42120 . ■ '• 

^ttatcgac cgcttttgaa aatgtaaaca tagtgacttt attgctgtaa aaatcatacg 42180 . 
tgt^tatcat cttaaaattc aggaaacatg gacagg^aca aagatgtgca aaatatcatc 42240 
caaaatccca tttgctggcc aggcacggtg gctcacgcct gtaatcccag cacattggga 42300 
ggccgaggcg ggcaaatcac ttgaggtcag gagtttgaga ccagcctggc caacatggtg '42360 
aaaccctatc tctactaaaa a'tacaataat taggctgggc gcagtggctc acgcctataa 42420 
tcccagcact ttgggaggcc gaggtgggcg aa-tcacaagg tcaggagttt gagactagcc 42480 
tggccaatat ggtgaaaccc catctctact aaaaatacaa aaattagggc cgggtgtggt 42540 
ggctcacgcc tgtaatccca gcacttaggg aggccgagac agatggatcg cgagatcagg 42600 
agttcgagac caacctagcc aacatggtga aaccccatct ctactaaaaa aatacaaaaa 42660 
tta^tcggti gtggtggeac acgcctgtaa tcceagctac ttgggagget gaggcaggag 42720 - 
aatetcttga acctgggagg cagaggttgc agtgagtgga gatcccgccg ttgcactcca 42780 
gcctgggcga cagagtgaga ctccatcaaa aaaaaaaaaa aaaaaaaaaa aaattagccg' 42640 
ggcgtggtgg cgtgcaccta tactcccagc ta'cttgggag gctgaggcag' gagaatcgct '42900 
tgaacctgga aggcggaggt cgcagtgagc cgagatcgtg ccattgcact tcagectggg "42960 - . .'^y- . 

cgacagagcg agactctgtc tcaaaaataa taataa^aac aataactagc cgggcctggt ' 43020 ' ' . 

ggcacatgcc tgtagtccca gttactcagg aggcggaggc ' atgagactca ggtgaactag 43080 ' ' 
ggagacagag gttgcagtga gccaagatca caceactgca c^cagcctg gttgacagag 43140 ' - * . 
cgagactctg tctcaaaaaa aaaaaaatcc catttgctca ttttttggat actagtataa 43200 
ctatcactct aaaccagtta gtacttaaat caagcagata tgggagatgg tgaattacca 43260 
tctacagtgt tgtcatatat g^cacatact gagcattatc agc^gtaga atctagttaa '43320 ~ 
ttgttctatg tgtgatgtat gcagagttcc cattttgaat gtgtttttac tatgcttaaa 43380 • 
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taaatgactg atgtcagcaa ccccaaaatg atacatctga -tgtaagagcc cctgttcccc 43440 
aataataaca tctaaactat agocattgga atgaacaggt gcccctAagt ttcctccctc 43500 
cagggtttct tggccggtct ctgaggacta cocatcccta. ctcccgtctt tcctcatctt 43560 . 
caggcgcagt aacagtatct ccaagtcccc tggccccagc tccccaaagg agcccctgct 43620 
gttcagccgt gacatcagcc gcftcagaatc ccttcgttgrt tccagcagct attcacagca 43680 
gatcttccgg ccctgtgacc taatccatgg ggaggtcctg gggaagggct tctttgggca 43740 
ggctatcaag gtgagcgcag gcaacaattg ctttgctctt. ctgcccccag tccctctgtc 43800 
actgtctttc ggggatttct catcacttgg ccccacccca caeca tgcag gatgccaggc 43860 
ctccttcctg gctttgggtg ttggtgtgag aggtatcctt cacccccacc caggccacct 43920 
aaggtcaatg ttgctgttac agtgagcttg tggacctgga gatccaggtt gggttgagct: 43980 
gtgectgtgg ccctcctgcc tccagtcagt gggtgtttgt taggtgcctg cagacctcag 44040 
taccgggcat gctacaagga gcacacaggg gaatggctcc tgcctccctg gtgaacagtc 44100 
tcagggacta acctctctct ttctctcctc ctcctcctct tctgctgaga actgggaggg 44160 
ggggtcaggt aagacgtgtg tctcagcttg ggggcagcag ggctggagag ctcacccccg 44220 
atccacccag ctccctggtg catgtctttg gcactgacct tcctgccccc, agacttctgt 44280 
tcactcagga gactcacttc tatgccaaat gaccagagcc cctgcttggc ttggcagcat .44340 
cccctcctgc cttcttcccc acttcccttt tctgggttct tgcctgtcct ctg^gcatgc 44400 
ccagctctcc aggaaagagg gtttgcttcc gtgtgag^cc catgttgctc cacgctgcat 44460 
cttccacaca tgaactctgt cattctgacc .cggctcagtg tgccctccaa gggatgggat 44520 . 
ggccagctgc atagattttc tcaaacagtt ctccagaact tcctctggtc tcagcaccat 44580 
taacagtcac cctccctgta ggtgacacac aaagccacgg gcaaagtgat ggtcatgaaa 44640 
gagttanttc gatgtgatga ggagacccag aaaacttttc -tgactgaggt aagaagatgg 44700 
agggggcccg ggaggttggt gtcaccattg gaagagagaa gaccttacaa ataatggctt 44760 
caagagaaaa tacagtttgg aattactgtc ttaaagacta agcagaaaag agccctagag 44820 
gaatatccca ctccctctaa attacagcgt aattatttg^ tcaatgaaca cttactaaaa 44880 
gcaacacaaa cagggtacaa gggatgcagt aacaaaagat acagggttca gaagagctct 44940 
caggttatga ggatgatgga catgaaaaca ctccaattta ^tacaactca atgttataat 45000 
cctcacctga acgccctgct aagggagcct ggaggggagc tccctgagca ctcacactcc 45060 
ttgggcattt acagttttca ctacccctcc caagttactt catggagtaa cttaagttgg 45120 
ggacacctgt ggtctgggta ttgccctcca agccacttgg ccactcccac cccagttctc .45180 
ccaatgcagt tccaagggta aggcctatga ag'ccatctcc atctatatgg tggtggtctt ; 45240 
ccctcatcct gatcttagtg ccctgtcata tcacaagata ggaggtagga gatacaggtg 45300 
gtaacacttg tcaagctgat tccttggagg gaagaggtaa ggaagacagt gagaagttaa 45360 
ccaccagctt tccttggctt cccccacccc caggtgaaag tgatgcgcag cetggaccac 45420 
cecaatgtgc tcaagttcat tggtgtgctg tacaaggata agaagctgaa cctgctgaca 45480 
gagtacattg aggggggcac actgaaggac tttctgcgca gtatggtgag cacaecaccc 45540 
catagtctcc aggagccttg gtgggttgtc agacacctat gctatcacta ccctaggagc 45600 
ttaaagggca gaggggccct gctttgcctc caaaggacca tgctgggtgg gactgagcat - 45660 
acatagggag gcttcactgg gagaccacat tgacccatgg ggcctggacc acgagtggga 45720 
cagggctcaa cagcctctga aaatcattcc ccattctgca ggatccgttc ccctggcagc 45780 
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agaaggtcag gtttgccaaa ggaatcgcct ccggaatggt gagtcccacc aacaaacctg 45840 
. ccagcagggc gagagtaggg agaggtgtga. gaatt:gtggg cttcactgga aggtagagac ' 45900 
cccrftcctat gcaacttgtg, tgggctgggt cagcagctat -tcattgagtt tgtctgtgtc 45960 
actgaaactg accccagcca actgttctca gttcacagcc ctgttttcaa agaattacac 46020 
atctctaaag gcaaacaggg cacggacaag gcaaactgga gaggcaaact gtagcctgag 46080 ■ 
atggcctggg cttgccatca caggtattca ggtgctgagg gcccttagac caactagagc 46140 ' 



acctcactgc ctaggaaatc aatgaagggg aaatgagttc tagcggagcc ctgaaggatc 46200 
agaattggat aaagttctta ttggcagaga ggcaccagga ttgaagtgac aggagcaaag . 46260 
acctgggagg aaagaggaga' aaatcatcta tttcacctgg aaacaaatga ttccaagcat 46320 
agaaataata acagctgaca agtactgagt gcccbctata ^gctaggcac tgggctgagg 46380 
. gattaacatg catgtgcatg ttta^tcctc atgacaacct ^ggtttccag ataagctgga 46440 
ctggaaaggg acagagctgg gatcctgggc taatcagtet ggtcgccaag cctgagactt 46300 
tagccactgc ccttcacatg ggggtccatg aaaatagtag ^agtctggaa c'agtttgggg 46560 . 
gtacatcaag gtcgctgtgt tttaagctat ggagtctgga ctataggaga caaatgtaaa 46620 
agagtttttt ggttgactgg ctttttggtt tttttgtttg tttgtttgtt tgtttgtttg 46680 
tttgtttgtt ttttcctgtt tctggggctt gaatcaggaa ggaggttttt ttgttgttgt 46740 ' 
tgttttgaga aaggatattg ctctgttgcc cagac^ggag tgcagtggca cgat^catggc 46800 
tcactacagc ttcgacctcc tgggctcaag caatcctcct gccttagcc^ cccaagtagc 46860 . 
iggactacag gtgtgtacca ccacacctaa ttttttgaat ttttttttct ^tttttttt 4S920 
tttttttttt ggtagagaca ggttctcact ttgt^gccca ggcctgaatc tcaaactcct 46980 
gggctcaagc attcctcctg cctcgccctc ccaaagtgtt gggattacag ttgtgagcca 47040 
ccatgcccgg caggaaaaga tttt-taagca agaaagctta agagctgtgg tttttccaaa 47100 
atgagtctgg gctggcacag tggctcatgc ctgtaatccc agcacttttt tgggaggccg 47160 
aggtgagtgg atcacttgag gtcaggagtt tgagaccagc ctggccaact ggtgaaaccc 47220 
ctgtttctac taaagaaaaa aatgcaaaaa ttagctgggc gtggtggtgc acgcctgtag 47280 
tcccagctac tcaggaggcc gaggcaggag aatagcttga acctgggagg cagaagttgc 47340 
agtgagccaa gatcacacca ctgcattcca gcctgggtga cagagtgaga cttcatctca 47400 
aaaaaaaaaa aaaagagaga ctgat.atggt tagtacattg gggtggaatg cggagggtcc 47460 
agggaatgga gccctgcata gggggctaat gaaacatttc agatttctga attaa'ggtag 47520 
tggctgtggg gacaggagcc tgggaggcag' ggtggagtca gaatggagag actggttggc 47580 
■ aatgagggaa caggaggagg aggaggagga'gttacgagtg gcttgaggtg tcacttacca 47640 
-gocatttggg ggatggggga tagccgtgat tgttgagcaa ctggtttggg aagagctagc^ 47700 
attgatccct gctgttctgt gctagcagaa cctatcagca tcttctgggc aggaaactgg 47760 
ctccatgaga ctggcttagg gagaggctgc tagtcaccta atctgcagag aaggggcagc ' 47820 
tggagctgtg ggacagaaga ggcatccatg tagctggtgg gggtgtctca gcttgtgaag 47880 
aggagatggc tttgagcagg gctgacactg aeiaaggctgg aagaaaaaaa cagacacaca 47940 
agagtctcag gatcaggtag cataggaaag ttgtggacag 'tctttgagga gcactccctc 48000 
aggcaggcag gcaggcaggt catgagctat agcgattcag gaagagctcc ctgggtgtgt 48060 
gagcagctcc aggagcctaa gggatgaaag tagtatrtgca gggggctgga gagcaaggag' 48120 
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t^gctccttc tacatttgca agggaaggag aaaggaagtt gctcctgaga gtggtaagag 48180 
tcagtggtgg aggcctggag aggagacata acaaacaaat . ttgttgacaa acattttggt 48240 
. -• ■ *g?»*9gggg. «g«gcttaaa>ttt^^ tgggg««??t ggagtcttag'aggaggtgaa : 48300 
tgtctgaaag acagagctag ctggagcaag aagtcacttc tctgttgcag gcaggaagga ,48360 
tccaaagtgg ctcaagccag agattgggag agtggggagg agggagcagc ctggatctaa 48420 
gtaaaatggg tagaggtgga gggggtgctg caacggccag ggttttctga agttggggac , 48480 
attaggagag agctgtgagg gctttggcca gccactgtgc tagtgattgg tgaaccaaag .48540 
gatgggcagg agatggcagc agggaagcag aggaagtcca ggcttcctgt tggtattggg . 48600 
acaagggaga ggccatagga ggccctggcc ctgttgtcca ggttgggttc tgaagctggg 48660. 
tgggcatggc ctggtaggag agcatctatg gcgcccaatt ccagattcag ggtctagttg 48720 
atttgctggc cctgtagcct cagctcatgc ttctgttcca ggcctatttg cactctatgt .48780, 
gcatcatcca ccgggatctg aactcgcaca actgcctcat caagttggta tgtcccactg 48840 
ctctgggcct ggcctccagg gtcctatcct tcctggcttc cttgtcacaa aggaggctga 48900 
cttgtcccct ctggctagag ggcagaggtg ttgcctagga gctcctatct ttcccttcct 48960 
gcttcttcca atgcccttct etgtcctctg ggagctccga gacacacaca gacataattt 49020 
caccttctct cattagcaac ctttgaaata atttgattag aagggacttc agaagtttgt 49080 
tgactatatg tagaaaaccc tgtcatttta cctgcttttg ccccatagta gtcttgtaaa 49140 
acagttcatt gctgacccca ttttacagtg gtggcacctg aagcct<;agc ctgaggccac 49200 
qgagctagta aatttacagg gaccagtttg agaccagcat tec tcccact gccc<^tcagc .49260 
tgtggtggtt acaatgttgt ttgtcttact gacttgctat ctggcttcct gggtgtctac 49320 
cggctggccc tggctctgcc ctctagaccc acaccacgca atcttcattc ctttcccaca 49380 
tgactgccct gtagctattc aaagagcttg tctcccccaa gtctccccat ctactgcctc 49440 
caccttgcct ttttctgtct tatcctggtt ctagccactg cctgaaatca ttttaggaat 49500 
aagacaggac agggaaaaac aaaagcaacc ccctgtccca cctctgagtt ccactctcca 49560 
agtccctgag cctcacctcc agggctccag tggctctgcc atgaacccac tgtgggctgg 49620 
gagtctgctg tgcacagata ccagaccctc agaaacacaa atgccaagtg tgtctgtttt 49680 
tttgttttgt tttgttttgt tttttagatg gagtctcatt ctgtttccca ggctggagtg 49740 
cagtggtgca atcttggctt actgcagcct ctacctcccg ggttctagtg attgttctgc 49600 
ttcagcctcc cagtagctag gactacaggc gtgtgccacc acgcccagct aatttttttt 49860 
. tttttttttt tgtattttta gtagagacag ggttttgcca tgttggccag gctggtcttg 49920 
.aactcctgac ctcaggtgat tcacccgbct tggcctccca. aagttctggg attacaggtg .49980- 
gaagccaccg tgcctggcct gagtgtgtct atttgataga gctttctgct ctgattctcc 50040 , 
cttgctatac accttttctc cccttctcag tggcttctct tgcctatgct tcctccccag 50 100 , _ 
ggccaggttt gagaacatcc ccatgaagtc ctgacctgtc ttttatccta ccaggacaag 50160 
actgtggtgg tggcagactt tgggctgtca cggctcatag tggaagagag gaaaagggcc 50220.. ^. 
cccatggaga aggccaccac caagaaacgc accttgcgca agaacgaccg caagaagegc 50280 ^ 
tacacggtgg tgggaaaccc ctactggatg gcccctgaga tgctgaacgg tgagtcctga 50340 
agccctggag gggacacccg cagagggagg acagatgctg cccttgcatc agagccctgg 50400 
gaattccagg ggaggcctgt gaagcgtagg accggatacc cagagctgag gatatttttc 50460^ 
ccttgccagg tggggcctca cgatttagct cctgagctca gggggctggg aactgatcag 50520 
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tgtcccatca tgggggataa ggtgagttct goctgtggca tttgtgcctc agggatcgct 50580 
aagagctcag gctattgtcc cagctttagc cttctctctc catggtgaga actgaagtgt 50640 
ggtgccctct ggtggataat gctcaaacca accagagatg etggttggga ttcttgaaat 50700 
cagggttgtg aggcctcaga aatggtctga atacaatcca ttttggagtc tgaggcccag 50760 
agaagttcag tgaattgcct aggagcatac agctgcctaa tggcagaggc tagatgaacc '50820 
ctagtctggt tcttttccac tttaacgtgc agtttcatcc taggcagtgt tatgttataa 50880 
gggctctcca aggcagttca cctacggctg aggaaggact attttcaggt /ggtgtctgcg 50940 
caggacagcc tgtggggtgt ccctacagaa cctgttctag ccctagttct tagctgtggc 51000 
ttagattgac cctagaccca gtgcagagca ggtaagggat gtaaacttaa cagtgtgctc 51060 
tcctgtgttc cccaaggaaa gagctatgat gagacggtgg atatcttctc ctttgggatc - 51120 
gttctctgtg aggtgagctc tggcaccaag gccatgcccg aggcagcagg cctagcagct '51180 
ctgccttccc tcggaactgg ggcatctcct cctagggatg actagcttga ctaaaatcaa 51240 
catgggtgta gggttttatg gtttataacg catctgcaca tctttgccac gttcgtgttt 51300 . 
cattggtctt aagagaagga ctggcagggt ttttttgttt tagatggagc ctcacttcgt 51360 
tgcccaggct ggagtgcagt ggcacaatct gggctcactg caacctctgc cttctgggtt 51420 
caagtgattc tcctgcctca gcctcccaag tagctgggac taccggcaca caccaccatg 51480 
cccggctaat ttttgtattt ttagtagaga cagggtttca ccatgttggc caggctggtc 5li540 
ttgaactccg gacctcaggt gatccgcctg cctlcagcctc taaaagtgct ggaattaata 51600 
ggcgtgagct acctcgcccg gccaggtttt tttttttttt tttttagttg aggaaactga . 51660 
ggcttggaag agggcagtgg cttgcacatg gtcgataagg ggcagatgag actcagaatt 51720 
ccagaaggaa gggcaagaga ctgttcatgt ggctgtctag ctagctcttg ggccaaatgt 51780 
agcccttctc agttcccttc aagtagaagt agccactcta ggaagtgtca gccctgtgcc 51840 
aggtaccacg tggacagagt gaggaatctt ggaaagattc ctacctttag gagtttagtc 51900 
aggtgacagc atatctcagc gactcaaaca cacacacatt caaagcettc tgtaattcct 51960 
ttcaaagttgt gaggggtaga ggagaggaga gacaagggat ggttaggata atgaaggaat 52020 
gttttgtttt tgtttttgtt tttgagatgg agtttcactc tgtcacccag gctggagtgc 52080 
agaggtgcaa tcttggctca ctgcagcctc cgcctcccag gttcaagcaa tcctcctgcc 52140 
tcagcctccc aagtagctgg gactacaggt gtgcgccacc acgcctggct aatttttgta 52200 
ttttcagtag agacagggtt tcgccatatt ggccaggctg gtctcaaatg cctgacctca 52260 
ggtgatacac ccgcttcagc ctcccaaagt gctgagatta caggcatgag ctiaccgtgcc ] 52320 
tggccatgaa ggaagatttg ttttaaaaaa ttgttttctt taatattaat tgaacacctc 52380 
tgttcagagc actgggctgg tgccagaggg tttcagacat gaatc agate cagcacc tea 52440 
tagagcctta atctggcaca cacacacagc cacaaggaga cacagacaag gcagggtagg 52500 
atgagtggaa gctaggagca gatgctgatt tggaacactt ggcttctgca gtgaagcccc 52560 
ttcttagtcc tcttcagtaa cccagctctc agtggataca ggtctggatt agtwgattt 52620 
ggagagatga ttggggattg gggagagctc tctaacctat tttaccacct cctcttctgc 52680 
cattcttcct gtccacatcc ccagcatccc tttcccttgc caagtatctg tggcctctgt 52740 
ogtcctttgt aaacagctgt cttcttaccc tacagatcat tgggcaiggtg' tatgc agate 52800 
ctgactgcct tcccegaaca ctggactttg gceteaacgt gaagcttttc tgggagaagt 52860 
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ttgttcccac agattgtccc ccggccttct tcccgctggc cgccatctgc tgcagactgg 52920 
agcctgagag caggttggta tcctgccttt ttctcccagc tcacagggtc ctgggacgtt^ 52980 
. tgcctctgtc .taaggccacc cctgagccct etgcaagcac aggggtgaga'.gaai^cttga . 53040 . 
ggtcaagaat gtggctgtca acccctgagc catctgacaa cacatatgta caggttggag 53100 
aagagagagg taaagacata gcagcaagta atctggatag gacacagaaa cacagceatt .53160 
aaaagaaagt ttaaaagaag gaaattcacc caaaccattt gaatacagta agtgtattca 53220 
tctttcgata ttcccctgtc catatctaca catatacttt tttttatagt aaatagttct 53280.. 
gtattttgcc ctgcatttcc cttgtgttta ctatccagtc ttcctgttta tcatttttgt 53340 
cgacaacatg aaattctatt gagagactgt ctgaaca tat tgtaatgtagatgttcaggt 53400 
ttttccagtt tctctttaca ataggtattt aactacagtg agcagtttta tgcatttage 53460 
taatttctcc tttgaggaag tattttcaaa attaccttta ttcttctcag gtaataattt , 53520 
cattattacc aaagttaccc taggtctttt caagtgtgtg gttaaaaaac gagaatctgg 53580 
ctgggcgcga tggctcacac ctgtaatccc agcactttgg gaggctgagg ctggtggatc 53640 
acctgaggtc tggagttcga gaccagcctg gccaacatgg tgaaacccca tctctactaa 53700 
aaatacaaaa cttagceagg catggtggca ggtgcctgta accccagcta cttgggaggc 53760 
tgaggcagga gaattgcttg aacccagggg cggaggttgc agtgagccga tatcacgcca 53820 
ttgcactcca gcctcggcaa caagagtgaa actctgtctc aaaaatgggg ttcttttcct 53880 
gccatcaaaa atcatgtttc ttttaaaaac aagttcaaac attaccaaag tttatagcac ,53940 
• . ag^AAOtacg tcttctgtaa tctcccttaa ccaatatatc cctcaa^ 

ccaactccac cctcccagga taaccagttg ggacataatc tttatttaaa aatggtttcc 54060 
ggatagagaa agcgcttcgg cggcggcagc cccggcggcg gccgcagggg acaaagggcg 54120 
ggcggatcgg cggggagggg gcggggcgcg accaggccag gcccgggggc tccgcatgct 54180 
gcagctgcct ctcgggcgcc cccgccgccg ccctcgccgc ggagccggcg agctaacctg 54240 
agccagccgg cgggcgtcac ggaggcggcg gcacaaggag gggccccacg cgcgcacgtg 54300 
gccccggagg ccgccgtggc ggacagcggc accgcggggg gcgcggcgtt ggcggccccg 54360 
gccccggcce ccaggccagg cagtggcggc caaggaccac gcatctactt tcagagcccc 54420 
ccccggggcc gcaggagagg gcccgggctg ggcggatgat gagggcccag tgaggcgcca 54480 
agggaaggtc accatcaagt atgaccccaa ggagctacgg aagcacctca acctagagga .54540 
gtggatcctg gagcagctca cgcgcctcta cgactgccag gaagaggaga tetcagaact 54600 
agagattgac gtggatgagc tcctggacat ggagagtgac .gatgcctggg cttccagggt 54660 
caaggagctg.ctggttgac^ gttacaaacc cacagaggcc ttcatctctg.gc^ 
caagatccgg gccatgcaga agctgagcac accccagaag aagtgagggt ccccgaecca 54780 
ggcgaacggt ggctcccata ggacaatcgc taccccccga cctcgtagca acagcaatac. 54840 
cgggggaccc tgcggccagg cctggttcca tgagcagggc tcctcgtgcc cctggcccag 54900 . 
gggtctcttc ccctgccccc tcagttttcc acttttggat ttttttattg ttattaaact 54960 
gatgggactt tgtgttttta tattgactct gcggcacggg ccctttaata aagcgaggta 55020 
gggtacgcct ttggtgcagc tcaaaaaaaa aaaaaaaaat gatttccagc ggtccacatt 55080 
agagttgaaa ttttctggtg ggagaatcta taccttgttc ctttataggc caaggaccgc 55140 
agtccttcag taacaccagt gtaaaagctt gaggagaaat tgtgaagcta cacagtattt .55200 ^ 
gttttctaat acctcttgtc attctaaata tctttaattt attaaaaaat atatatatac 55260 
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a^tattgaat gcctactgtg tgctaggtac agttctaaac ACttgggtta cagcagcgaa 55320 
caaaataaag gtgcttaccc • tcatagaaca tagattctag catggi^tct* actgtatcat ' 55380 
- acagtagata caataagtaa actatattga' ata^t^gaat . gtggcagatg' ctatggaaaa 55440 
agagtcaaga caagtaaaga cgattgttca gggtaccagt tgcaatttta aatatggtcg 55500 
• tcagagcagg cctcactgag gtgacatgac atttaagcat aaacat^gag gaggaggagt 55560 
aagcctgagc tgtcttaggc ttccggggca gccaagccat ttccgtggca ctaggagcct 55620* 
ggtgtttccg attccacctt tgataactgc attttctcta agatatggga gggaagtttt 55680 
tctcctattg tttttaagta ttaactccag ctagtccagc ctt^tatag tgttacctaa 55740 
tctttatagc aaatatatga ggtaccggta acattatgcc catttctcac agaggbactui' 55800 ' 
ctaggtgaag gagtttgcct gacgttatac aaccaggaag' tagctgagcc tagatccctt ' 55860 
ceacccaccc catggccctg ctcatgttcc acctgcctct aatttacctc ttttccttct 55920 
agaccagcat tctcgaaatt ggaggactcc tttgaggccc tctccctgta cctgggggag '55980 
ctgggcatcc cgctgcctgc agagctggag gagttggacc acactgtgag catgcagtac 56040 1 
ggcctgaccc gggactcacc tccctagccc -tggcccagcc ccctgcaggg gggtgttc^a 56100 
cagccagcat tgcccctctg tgccccattc ctgctgtgag ' cagggccgtc cgggcttcct 56160 
gtggattggc ggaatgt^ta gaagcagaac aagccattcc tattacctcc ccaggaggca 56220 
agtgggcgca gcaccaggga aatgtatctc cacaggttct ggggcctagt tactgtctgt 56280 
aaatccaata cttgcctgaa agiCtgtgaag aagaaaaaaa . cccctggdct ttgggccagg ' '56340 . 
aggaatctgt tactcgaatc ' cacccaggaa ctccc^ggca gtggattgtg ggaggctctt' 56400 
gcttacacta atcagcgtga cctggacctg ctgggcagga tcccagggtg aacctgcctg 56460 
tgaactctga agtcactagt ccagctgggt gcaggaggac ttcaagtgtg tggacgaaag' ' 56520 
aaagactgat ggctcaaagg gtgtgaaaaa gtcagtgatg ctcccccttt etactccaga 56580 
^cctgtcctt cctggagcaa ggttgaggga gtaggt^ttg aagagtccct taatatglgg '56640 
tggaacaggc caggagttag agaaagggct ggcttctgtt tacctgctca ctggctctag 56700 
eeagcccagg gaccacatca atgtgagagg aagcctccac ctcatgtttt caaacttaat 56760 
actggagact ggctgagaac ttacggacaa catcctttct gtctgaaaca aacagtcaca 56820 
agcacaggaa gaggctgggg gactagaaag aggeectgce c-tctagaaag ctcagatett 56880' 
ggcttctgtt actcatactc gggtgggctc cttagtcaga tgcctaaaac attttgc'cta 56940 ' 
aagctcgatg ggttctggag gacagtgtgg cttgtcacag gcctagagtc tgagggaggg 57000 ''' 
gagtgggagt ctcagcaatc tcttggtctt ggctt6atgg caaccactgc tcacc'cttca - 57060 • ■ ' 
acatgcctgg tttaggcagc agcttgggct gggaagaggt ggtggcagag Vctcaaagct " 57120 ' 
gagatgctga gagagatagc tccctgagct gggccatctg acttctacct cccatgtttg ' 57180 
ctctcccaac tcattagctc ctgggcagca tcctcctgag ccacatgtgc aggtactgga' 57240 
aaacctccat cttggctccc agagctctag gaactcttca tca'caactag atttgcctct ' 57300 
tetaagtgtc tatgagcttg caccatattt aataaattgggaatgggttt^ggggtattaa' 57360 " 
tgcaatgtgt ggtggttgta ttggagcagg gggaattgat aaaggagagt ggttgctgtt 57420 ~ 
aatattatct tatctattgg gtggtatgtg aaatattgta catagacctg atgagttgtg 57480 
ggaccagatg tcatctctgg tcagagttta cttgctatat a'gactgtact tatgtgtgaa 57540'"-' 
gtttgcaagc ttgctttagg gctgagccct ggactcccag cagcagcaca gttcagcatt 57600 
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gtgtggctgg ttgtttcctg gctgtcccca gcaagtgtag gagtggtggg cctgaactgg 57660 
gccattgatc agactaaata eattAagcag ttaacataac tggcaatatg gagagtgaaa 57720 
acatgattgg ctcagggaca taaatgtaga gggtctgcta gccaccttct ggcctagdcc 57780^ 
acacaaactc cccatagcag agagttttca tgcacccaag tctaaaaccc tcaagcagac/~'57840 
acccatctgc tctagagaat atgtacatcc cacctgaggc agccccttcc ttgcagcagg 57900 
tgtgactgac tatgaccttt tcctggcctg gctctcacat gccagctgag tcattcctta 57960 
ggagccctac cctttcatcc tctctatatg aatacttcca tagcctgggt atcctggctt '56020 
gctttcctca gtgctgggtg ccacctttgc aatgggaaga aatgaatgca agtcacccca 56080 
ccccttgtgt ttccttacaa gtgcttgaga ggagaagacc agtttcttct tgcttctgca 58140 
tgtgggggat gtcgtagaag agtgaccatt gggaaggaca atgctatctg g^tagtgggg 58200 
ccttgggcac aatataaatc tgtaaaccca aaggtgtttt ctcccaggca ctctcaaagc 58260 
ttgoagaatc caacttaagg acagaatatg gttcccgaaa aaaactgatg atctggagta 58320 
cgcattgctg gcagaaccac agagcaatgg ctgggcatgg gcagaggtca tctgggtgtt 58380 
cctgaggctg ataacctgtg gctgaaatcc cttgctaaaa gtccaggaga cactcctgtt 58440 
ggtAtctttt cttctggagt catagtagtc accttgcagg gaacttcctc agcccagggc S8S0O 
tgctgcaggc agcccagtga cccttcctcc tctgcagtta ttcccccttt ggctgctgca 58560 
gcaccacccc cgtcacccac cacccaaccc ctgccgcact ccagccttta acaagggctg 56620 
tctagatatt cattttaact acctccacct tggaaacaat tgctgaaggg gagaggattt 58680 . 
gcaatgacca accaccttgt tgggacgcct gcacacctgt .ctttcct get teaaectgaa 56740 . 
agattcctga tgatgataat ctggacacag aagccgggca cggtggctct agcctgtaat 56600 
ctcagcactt tgggaggcct cagcaggtgg atcacctgag atcaagagtt tgagaacagc 56860 
ctgaccaaca tggtgaaacc ccgtctctac taaaaataca aaaattagcc aggtgtggtg 58920 
gcacatacct gtaatcccag ctactctgga ggctgaggca ggagaatcgc ttgaacccac 56960 
aaggcagagg ttgcagtgag gcgagatcat gccattgcac tccagcctgt gcaacaagag 59040 
ccaaactcca tctcaaaaaa aaaaa 59065 

<210> SEQ ZD RO 4 
<211> LENGTH: 265 
<212> TYPE: PRT 
<213> ORGAHISNs Human 

<400> SEQUHHCE: 4 

Leu Thr Clu Val Lya Val Met Arg Ser Leu Asp Bla Pro.Asn Val Leu 

. 1. 5 - " • IP 15 : ' 

Lye *he He Cly Val Leu Tyr Lye Asp Lys Lye Leu Asn Leu Leu Thr 
20 25 30 

Clu Tyr He Clu Cly Cly Thr Leu Lye Asp Phe Leu Arg Ser Ket Asp 
« 40 45 

Pro Fhe Pro Trp Cln Cln Lya Val Arg Phe Ala Lys Cly He Ala Ser 
50 55 60 

Cly Met Ala Tyr Leu Els Ser Met Cys He He His Arg Asp Leu Asn 
*5 70 75 80 

Ser Bis Asn Cys Leu He Lys Leu Asp Lys Thr Val Val Val Ala Asp 

85 90 . 95 . 

Phe Cly Leu Ser Arg Leu He Val Clu Glu Arg Lys Arg Ala Pro Met 
100 105 110 
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Glu Lys Aia Tbr Thr Lyo Lys Arg The Leu Arg Lya Asn Asp Arg Lys 
115 120 125 

Lys Arg Tyr the Val Val Gly Asn Pro Tyr .Trp Ket Ala Pro Glu Met 

• 130 . . ;;yi35 , :; 1.40 - : '.' *;■ ■ :■ 

Leu Aan Cly Lys Ser Tyr Asp Glu Thr Val Asp.Ilo Phe Ser Phe Gly ... 
145 150 155 160 

He Val Leu Cya Glu He He Gly Gin Val Tyr Ala Asp Pro Asp Cye 

165 170 . - - ; . / 

Leu Pro Arg Thr Leu Asp Phe Gly Leu Asn Val Lys Leu Phe Trp Glu ... 
180 185 190 ^ 

Lya Phe Val Pro Thr Asp Cys Pro Pro Ala Phe Phe Pro Leu Ala Ala 

195 200 - , .305 , . . 

He cys Cys Arg Leu Glu Pro Glu Ser Arg Pro Ala Phe Ser Lys Leu 
210 215 220 - 

Glu Asp Ser Phe Glu Ala Leu Ser Leu Tyr Leu Gly Glu Leu Gly He 

225 230 235 . . 240 . . 

Pro Leu Pro Ala Glu Leu Glu Glu Leu Asp Bis Thr Val Ser Met Gin 

245 250 255 • 

Tyr Gly Leu Thr Arg Asp Ser Pro Pro 

260 -265 . ... 



That w\dch is claimed is: . 5. An isolated polynucleotide consisting of a nucleotide 

1. An isolated nucleic acid molecule consisting of a ^ sequence set fortb in SEQ ID NO:l. 

nucleotide sequence selected from the group consisting of: . isolated polynucleotide consisting of a hicleotidc 

(a) a nucleotide sequence that encodes an amino add . sequence set forth in SEQ ID N0:3. 
sequence shown in SEQ ID N0:2; i * .j- . t • ^ - ;^ 

, . , , . . ^ 7. A vector according to claun 2, ^erem said vector IS 

(b) a micleic aod molecu e of the nucleic aad 3^ the group consisting of a plasmid, virus, and 
sequence of SEQ ID N0:1; bacteriophage. ^ . - - - 

(c) a nucleic acid molecule consisting of the nucleic add « a ^a-^ *^ • ^ • -j • 1 * j 

sequence of SEQ ID N0:3: and ^- ^ *f~f }° 2- f»»«>. 

^ . . . *. . . . nucleic aad molecule is inserted into said vector in proper 

(d) a nucleotide sequence that is completely complemen-. orientation and correct leading frame such that the protein of 
taiy to a nucleoude sequence of (aHc). « SEQ ID N03 may be expressed by a cell transformed with 

2. A nucleic acid vector compnsmg a nucleic aad mol- ^ . ■ . ^ ' 
cculc of claim 1. vccior. 

3. A host cell containing the vector of daim 2. 9. A vector according to claim 8, herein said isolated 

4. A process for producing a polypeptide comprising nucleic add molecule is operatively linked to a promoter 
culturing the host cell of daim 3 under conditions suJSdent 45 sequence. 

for the production of said polypeptide, and recovering the / ; " 

peptide from the host cell culture. ♦ ♦ ♦ ♦ * 
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11 Claims, No Drawings 



6,043,052 



GPR 25 POLYNUCLEOTIDES 

HELD OF THE INVENTION 

This invention relates to newly identified polypeptides 
and polynucleotides encoding such polypeptides, to their use 
in therapy and in identifying compounds which may be 
agonists, antagonists and/or inhibitors which are potentially 
useful in therapy, and to production of such polypeptides and 
polynucleotides. 

BACKGROUND OF THE INVENTION 

The drug discovery process is currently undergoing a 
fundamental revolution as it embraces 'functional 
genomics', that is, high throughput genome- or gene-based 
biology. This approach is rapidly superceding earlier 
approaches based on ^positional cloning*. A phenotype, that 
is a biological function or genetic disease, would be iden- 
tified and this would then be tracked back to the responsible 
gene, based on its genetic map position. 

Functional genomics relies heavily on the various tools of 
bioinformatics to identify gene sequences of potential inter- 
est from the many molecular biology databases now avail- 
able. There is a continuing need to identify and characterise 
further genes and their related polypeptides/proteins, as 
targets for drug discovery. 

It is well established that many medically significant 
biological processes are mediated by proteins participating 
in signal transduction pathways that involve G-proteins 
and/or second messengers, e.g., cAMP (Lefkowiiz, Nature, 
1991, 351:353-354). Herein these proteins are referred to as 
proteins participating in pathways with G-proteins or PPG 
proteins. Some examples of these proteins include the GPC 
receptors, such as those for adrenergic agents and dopamine 
(Kobilka, B. K., et al, Proc. Natl Acad. Sci.. USA, 1987, 
84:46-50; Kobilka, B. K., et al.. Science, 1987, 
238:650-656; Buozow, J. R., et al., Nature, 1988, 
336:783-787), G-proteins themselves, effector proteins, 
e.g., phospholipase C, adenyl cyclase, and 
phosphodiesterase, and actuator proteins, e.g., protein kinase 
A and protein kinase C (Simon, M. L, et ah, Science, 1991, 
252:802-S). 

For example, in one form of signal transduction, the effect 
of hormone binding is activation of the enzyme, adenylate 
cyclase, inside the cell. Enzyme activation by hormones is 
dependent on the presence of the nucleotide, GTP. GTP also 
influences hormone binding. A G-protein connects the hor- 
mone receptor to adenylate cyclase. G-protein was shown to 
exchange Gil? for bound GOP when activated by a hormone 
receptor. The GTP-carrying form then binds to activated 
adenylate cyclase. Hydrolysis of GTP to GDP, catalyzed by 
the G-protein itself, returns the G-protein to its basal, 
inactive form. Thus, the G-protein serves a dual role, as an 
intermediate that relays the signal from receptor to effector, 
and as a clock that controls the duration of the signal. 

The membrane protein gene superfamily of G-protein 
coupled receptors has been characterized as having seven 
putative transmembrane domains. The domains are believed 
to represent transmembrane a-helices connected by extra- 
cellular or cytoplasmic loops. G-protein coupled receptors 
include a wide range of biologically active receptors, such as 
hormone, viral, growth factor and neuroreceptors. 

G-protein coupled receptors (otherwise known as 7TM myocardial infarction; stroke; ulcers; asthma; allergies; 
receptors) have been characterized as including these seven 65 benign prostatic hypertrophy; migraine; vomiting; psychotic 
conserved hydrophobic stretches of about 20 to 30 amino and neurological disorders, including anxiety, 
acids, connecting at least eight divergent hydrophilic loops. schizophrenia, manic depression, depression, delirium, 
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The G-protein family of coupled receptors includes dopam- 
ine receptors which bind to neuroleptic drugs used for 
treating psychotic and neurological disorders. Other 
examples of members of this family include, but are not 
limited to, calcitonin, adrenergic, endothelin, cAMP, 
adenosine, muscarinic, acetylcholine, serotonin, histamine, 
thrombin, kinin, follicle stimulating hormone, opsins, endot- 
helial differentiation gene-1, rhodopsins, odorant, and 
cytomegalovirus receptors. Most G-protein coupled recep- 
tors have single conserved cysteine residues in each of the 
first two extracellular loops which form disulfide bonds that 
are believed to stabihze functional protein structure. The 7 
transmembrane regions are designated as TMl, TM2, TM3, 
TM4, TM5, TM6, and TM7. TM3 has been implicated in 
signal transduction. 

Phosphorylation and lipidalion (palmitylation or 
farnesylation) of cysteine residues can influence signal 
transduction of some G-protein coupled receptors. Most 
G-protein coupled receptors contain potential phosphoryla- 
tion sites within the third cytoplasmic loop and/or the 
carboxy terminus. For several G-protein coupled receptors, 
such as the p-adrenoreceptor, phosphorylation by protein 
kinase A and/or specific receptor kinases mediates receptor 
desensitization. 

For some receptors, the ligand binding sites of G-protein 
coupled receptors are believed to comprise hydrophilic 
sockets formed by several G-protein coupled receptor trans- 
membrane domains, said sockets being surrounded by 
hydrophobic residues of the G-protein coupled receptors. 
The hydrophilic side of each G-protein coupled receptor 
transmembrane helix is postulated to face inward and form 
a polar ligand binding site. TM3 has been implicated in 
several G-protein coupled receptors as having a ligand 
binding site, such as the TM3 aspartate residue. TM5 
serines, a TM6 asparagine and TM6 or TM7 phenylalanines 
or tyrosines are also implicated in hgand binding. 

G-protein coupled receptors can be intracellularly 
coupled by heterotrimeric G-proteins to various intracellular 
enzymes, ion channels and transporters (see, Johnson et al., 
Endoc. Rev., 1989, 10:317-331). Different G-protein 
a-subunits preferentially stimulate particular effectors to 
modulate various biological functions in a cell. Phosphory- 
lation of cytoplasmic residues of G-protein coupled recep- 
tors has been identified as an important mechanism for the 
regulation of G-protein couphng of some G-protein coupled 
receptors. G-protein coupled receptors are found in nimier- 
ous sites within a mammalian host. Over the past 15 years, 
nearly 350 therapeutic agents targeting 7 transmembrane (7 
TM) receptors have been successfully introduced onto the 
market. 

SUMMARY OF THE INVENHON 

The present invention relates to GPR25, in particular 
GPR25 polypeptides and GPR25 polynucleotides, recombi- 
nant materials and methods for their production. In another 
aspect, the invention relates to methods for using such 
polypeptides and polynucleotides, including the treatment of 
infections such as bacterial, fungal, protozoan and viral 
infections, particularly infections caused by HI V-1 or HIV-2, 
pain; cancers; diabetes, obesity; anorexia; buhmia; asthma; 
Parkinson's disease; acute heart failiue; hypotension; hyper- 
tension; urinary retention; osteoporosis; angina pectoris; 
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dementia, and severe menial retardation; and dyskinesias, Polypeptides of the present invention can be prepared in 

such as Huntington's disease or Gilles dela Tourett's any suitable manner. Such polypeptides include isolated 

syndrome, hereinafter referred to as "the Diseases", amongst naturally occurring polypeptides, recombinantly produced 

others. In a further aspect, the invention relates to methods polypeptides, synthetically produced polypeptides, or 
for identifying agonists and antagonists/inhibitors using the 5 polypeptides produced by a combination of these methods, 

materials provided by the invention, and treating conditions Means for preparing such polypeptides are well understood 

associated with GPR25 imbalance with the identified com- in the art. 

poimds. In a still further aspect, the invention relates to In a further aspect, the present invention relates to GPR25 

diagnostic assays for detecting diseases associated with polynucleotides. Such polynucleotides include isolated 
inappropriate GPR25 activity or levels. lO polynucleotides comprising a nucleotide sequence encoding 

nPSPRIPTION OK TOP IMVFNTION ^ polypeptide which has at least 70% identity, preferably at 

UbbLKlFl lUN Uh ] Hh IN VbN 1 lUN j^^^j gQ^^ identity, more preferably at least 90% identity, yet 

In a first aspect, the present invention relates to GPR25 more preferably at least 95% identity, to the amino acid 

polypeptides. Such peptides include isolated polypeptides sequence of SEQ ID N0:2, over the entire length of SEQ ID 

comprising an amino acid sequence which has at least 70% N0:2. In this regard, polypeptides which have at least 97% 

identity, preferably at least 80% identity, more preferably at identity are highly preferred, whilst those with at least 

least 90% identity, yet more preferably at least 95% identity, 98-99% identity are more highly preferred, and those with 

most preferably at least 97-99% identity, to that of SEQ ID at least 99% identity are most highly preferred. Such poly- 

N0:2 over the entire length of SEQ ID N0:2. Such polypep- nucleotides include a polynucleotide comprising the nucle- 

tides include those comprising the amino acid of SEQ ID otide sequence contained in SEQ ID N0:1 encoding the 

N0:2. polypeptide of SEQ ID N0:2. 

Further peptides of the present invention include isolated Further polynucleotides of the present invention include 

polypeptides in which the amino add sequence has at least isolated polynucleotides comprising a nucleotide sequence 

70% identity, preferably at least 80% identity, more prefer- that has at least 70% identity, preferably at least 80% 

ably at least 90% identity, yet more preferably at least 95% identity, more preferably at least 90% identity, yet more 

identity, most preferably at least 97-99% identity, to the preferably at least 95% identity, to a nucleotide sequence 

amino acid sequence of SEQ ID NO: 2 over the entire length encoding a polypeptide of SEQ ID N0:2, over the entire 

of SEQ ID N0:2. Such polypeptides include the polypeptide coding region. In this regard, polynucleotides which have at 

of SEQ ID N0:2. least 97% identity are highly preferred, whilst those with at 

Further peptides of the present invention include isolated ^^^st 98-99% identity are more highly preferred, and those 

polypeptides encoded by a polynucleotide comprising the with at least 99% identity arc most highly preferred, 

sequence contained in SEQ ID N0:1. Further polynucleotides of the present invention include 

Polypeptides of the present invention are believed to be isolated polynucleotides comprising a nucleotide sequence 
members of the G-protein coupled, 7 transmembrane recep. 35 which has at least 70% identity, preferably at least 80% 

tor gene family of polypeptides. They are therefore of identity, more preferably at least 90% identity, yet more 

interest because this protein sequence, which was obtained preferably at least 95% identity, to SEQ ID N0:1 over the 

by PGR amplification of spleen cDNA, has six amino acid ^"^""^ 1^°8^^ ^0:1. In this regard, polynucle- 

dillerences from the published sequence for GPR25 having o^i^es which have at least 97% identity are highly preferred, 
GenBankAccession No. U91939, which pubUshed sequence 4^ whilst those with at least 98-99% identiy are more highly 

is believed to contain errors. These properties are hereinafter preferred, and those with at least 99% identity are most 

referred to as "GPR25 activity'' or "GPR25 polypeptide highly preferred. Such polynucleotides include a polynucle- 

activit/' or "biological activity of GPR25". Also included otide comprising the polynucleotide of SEQ ID N0:1 as 

amongst these activities are antigenic and immunogenic well as the polynucleotide of SEQ ID NO: 1. 
activities of said GPR25 polypeptides, in particular the 45 The invention also provides polynucleotides which are 

antigenic and immunogenic activities of the polypeptide of complementary to all the above described polynucleotides. 

SEQ ID N0:2. Preferably, a polypeptide of the present The nucleotide sequence of SEQ ID NO: 1 shows homol- 

invention exhibits at least one biological activity of GPR25, ogy with GPR25 (Jung, B. P. et al, Biochem. Biophys. Res. 

The polypeptides of the present invention may be in the Commun. 230 (1), 69-72 (1997)). The nucleotide sequence 
form of the "mature" protein or may be a part of a larger 50 of SEQ ID N0:1 is a cDNA sequence and comprises a 

protein such as a fusion protein. It is often advantageous to polypeptide encoding sequence (nucleotides 79 to 1161) 

include an additional amino acid sequence which contains encoding a polypeptide of 361 amino acids, the polypeptide 

secretory or leader sequences, pro-sequences, sequences of SEQ ID N0:2. 

which aid in purification such as multiple histidine residues. The nucleotide sequence encoding the polypeptide of 
or an additional sequence for stability during recombinant 55 SEQ ID NO: 2 may be identical to the polypeptide encoding 

production. sequence contained in SEQ ID N0:1 or it may be a sequence 

The present invention also includes include variants of the other than the one contained in SEQ ID N0:1, which, as a 

aforementioned polypeptides, that is polypeptides that vary result of the redundancy (degeneracy) of the genetic code, 

from the referents by conservative amino acid substitutions, also encodes the polypeptide of SEQ ID NO: 2. The polypep- 
whereby a residue is substituted by another with like char- 60 tide of the SEQ ID NO: 2 is structurally related to other 

acteristics. Typical such substitutions are among Ala, Val, proteiosof the G-proie in coupled, 7 transmembrane receptor 

Leu and He, among Ser and Thr; among the acidic residues gene family, having homology and/or structural similarity 

Asp and Glu; among Asn and Ghi, and among the basic with GPR25 (Jung, B. P. et at, Biochem. Biophys. Res. 

residues Lys and Arg; or aromatic residues Phe and Tyr. Commun. 230(1), 69-72 (1997)), 

Particularly preferred are variants in which several, 5-10, 65 Preferred polypeptides and polynucleotides of the present 

1-5, 1-3, 1-2 or 1 amino acids are substituted, deleted, or invention are expected to have, inter aha, similar biological 

added in any combination. functions/properties to their homologous polypeptides and 
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polynucleotides. Furthermore, preferred polypeptides and dextran sulfate, and 20 microgram/ml denatured, sheared 

polynucleotides of the present invention have at least one salmon sperm DNA; followed by washing the filters in 0.1 x 

GPR25 activity. SSC at about 65** C. Thus the present invention also includes 

Polynucleotides of the present invention may be obtained, polynucleotides obtainable by screening an appropriate 

using standard cloning and screening techniques, &om a 5 ^^J^^^ under stmgent hybridization conditions with a 

r^KfA i-u J • jT nxTA • 11 ri. o i labeled probe havmg the sequence of SEQ ID N0:1 or a 

cDNA library derived from mRNA in cells of human Spleen, ment thereof 

using the expressed sequence tag (EST) analysis (Adams, S. i mi j n • . .l - • 

M. D.. et al. Science (1991) 252:1651-1656; Adams. M. D. . PI^^AmI "^^u"^ ' '? f^-^ T^.^ 

. ^ Kf . i'lnnox ka n , ^ isolated cDNA scqucncc will be mcomplete, m that the 

l\ '^-^ ^f/nJ^x ^a^^^l^^^f 'o"^"^'"^ ^'J^ ' 'u ' in ^^gion coding forl^e polypeptide is cut short at the 5' end of 

Nature (1995) 377 Supp:3-174). Polynucleotides of the lO inrcD N A. T^is is a cL^ence of revere transcriptase, an 

mvention can also be obtained from natural sources such as ^^^^^ ^-^^ inherendy low ^processivity ' (a measure of the 

genomic DNA libraries or can be synthesized using well ^^^^^^ ^^^^^^ ^ ^^^^^ attached to the template 

known and commercially available techniques, during the polymerisation reaction), failing to complete a 

When polynucleotides of the present invention are used DNA copy of the mRNA template during 1st strand cDNA 

for the recombinant production of polypeptides of the synthesis. 

present invention, the polynucleotide may include the cod- There are several methods available and well known to 

ing sequence for the mature polypeptide, by itself; or the those skilled in the art to obtain full-length cDNAs, or 

coding sequence for the mamre polypeptide in reading frame extend short cDNAs, for example those based on the method 

with other coding sequences, such as those encoding a leader of Rapid Amplification of cDNA ends (RACE) (see, for 

or secretory sequence, a pre-, or pro- or prepro- protein ^° example, FrohmanetaL.PNAS US A 85, 8998-9002, 1988). 

sequence, or other fusion peptide portions. For example, a Recent modifications of the technique, exemplified by the 

marker sequence which facilitates purificaUon of the fused Marathon™ technology (Clontech Laboratories Inc.) for 

polypeptide can be encoded. In certain preferred embodi- ^!?J?P*^\*^'? significantly simplified the search for longer 

ments of this aspect of the invention, the marker sequence is ^^^^As^ Marathon™ technology, cDNAs have been 

« u^^. ui^tiAi^l «««t;^o .r;A^A tu^ ,,^^t«, 25 prepared from mRNA extracted from a chosen tissue and an 

a hexa-histidme peptide, as provided in the pQE vector T j . » i- * j * l j kt i - -j 

, V , J u J ■ * * 1 o Ar *i A J adaptor sequence hgated onto each end. Nucleic acid 

(^.^\Se^^^ amplification (PGR) is then carried out to amplify the 

Sci USA (1989) 86:821-824, or is an HA tag The poly- .^^^^ , 5. ^^^^ ^ combination of gene 

nucleotide may also comain non-codmg 5' and 3' sequences, ^^^^^^ ^^^^^^^ ^^^^^ oUgonucleotide primers. The 

such as transcribed, non-translated sequences, spUcmg and pcR reaction is then repeated using 'nested' primers, that is, 

polyadenylation signals, ribosome binding sites and primers designed to anneal within the amplified product 

sequences that stabilize mRNA. (typicaUy an adaptor specific primer that anneals further 3' 

Further embodiments of the present invention include in the adaptor sequence and a gene specific primer that 
polynucleotides encoding polypeptide variants which com- anneals further 5' in the known gene sequence). The prod- 
prise the amino acid sequence of SEQ ID N0:2 and in which ucts of this reaction can then be analysed by DNA sequenc- 
several, for instance from 5 to 10, 1 to 5, 1 to 3, 1 to 2 or 1, ing and a full-length cDNA constructed either by joining the 
amino acid residues are substituted, deleted or added, in any product directly to the existing cDNA to give a complete 
combination. sequence, or carrying out a separate full-length PCR using 

Polynucleotides which are identical or sufiBciently iden- the new sequence information for the design of the 5' primer, 

tical to a nucleotide sequence contained in SEQ ID NO:l, 40 Recombinant polypeptides of the present invention may 

may be used as hybridization probes for cDNA and genomic be prepared by processes well known in the art from 

DNA or as primers for a nucleic acid amplification (PCR) genetically engineered host cells comprising expression 

reaction, to isolate full-length cDNAs and genomic clones systems. Accordingly, in a further aspect, the present inven- 

encoding polypeptides of the present invention and to isolate tion relates to expression systems which comprise a poly- 

cDNA and genomic clones of other genes (including genes 45 nucleotide or polynucleotides of the present invention, to 

encoding homologs and orthologs from species other than host cells which are genetically engineered with such 

human) that have a high sequence similarity to SEQ ID expression sylems and to the production of polypeptides of 

N0:1. Typically these nucleotide sequences are 70% the invention by recombinant techniques. Cell-free transla- 

identical, preferably 80% identical, more preferably 90% tion systems can also be employed to produce such proteins 

identical, most preferably 95% identical to that of the 50 using RNAs derived from the DNA constructs of the present 

referent. The probes or primers will generally comprise at invention. 

least 15 nucleotides, preferably, at least 30 nucleotides and For recombinant production, host cells can be genetically 

may have at least 50 nucleotides. Particulariy preferred engineered to incorporate expression systems or portions 

probes will have between 30 and 50 nucleotides. thereof for polynucleotides of the present invention. Intro- 

A polynucleotide encoding a polypeptide of the present 55 duction of polynucleotides into host cells can be effected by 

invention, including homologs and orthologs from species methods described in many standard laboratory manuals, 

other than human, may be obtained by a process which such as Davis et al, Basic Methods in Molecular Biology 

comprises the steps of screening an appropriate library under (1986) and Sambrook et al.. Molecular Cloning: A Labora- 

stringent hybridization conditions with a labeled probe hav- tory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, 

ing the sequence of SEQ ID NO: lor a fragment thereof; and 60 Cold Spring Harijor, N.Y. (1989). Preferred such methods 

isolating full-length cDNA and genomic clones containing include, for instance, calcium phosphate transfection, 

said polynucleotide sequence. Such hybridization tech- DEAE-dextran mediated transfection, transvection, 

niques are well known to the skilled artisan. Preferred microinjection, cationic lipid-mediated transfection, 

stringent hybridization conditions include overnight incuba- electroporation, transduction, scrape loading, ballistic intro- 

tion at 42° C. in a solution comprising: 50% formamide, 65 duction or infection. 

5xSSC (150 mM NaQ, 15 mM trisodium citrate), 50 mM Representative examples of appropriate hosts include 

sodium phosphate (pH7.6), 5x Denhardt*s solution, 10% bacterial cells, such as streptococci, staphylococci, E. coli, 
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Streptomyces and Bacillus subtilis cells; fungal cells, such as 
yeast cells and Aspergillus ceils; insect cells such as Droso- 
phila S2 and Spodoptera Sf9 cells; animal cells such as 
CHO, COS, HeLa, C127, 3T3, BHK, HEK 293 and Bowes 
melanoma cells; and plant cells. 5 

A great variety of expression systenos can be used, for 
instance, chromosomal, episomal and virus-derived 
systems, e.g., vectors derived from bacterial plasmids, from 
bacteriophage, from transposons, from yeasl episomes, from 
insertion elements, from yeast chromosomal elements, from ^0 
viruses such as baculovimses, papova viruses, such as 
SV40, vaccinia viruses, adenoviruses, fowl pox viruses, 
pseudorabies viruses and retroviruses, and vectors derived 
from combinations thereof, such as those derived from 
plasmid and bacteriophage genetic elements, such as 
cosmids and phagemids. The expression systems may con- 
tain control regions that regulate as well as engender expres- 
sion. Generally, any system or vector which is able to 
maintain, propagate or express a polynucleotide to produce 
a polypeptide in a host may be used. The appropriate 20 
nucleotide sequence may be inserted into an expression 
system by any of a variety of well-known and routine 
techniques, such as, for example, those set forth in Sam- 
brook et al., MOLECULAR CLONING, A LABORATORY 
MANUAL (supra). Appropriate secretion signals may be 25 
incorporated into the desired polypeptide to allow secretion 
of the translated protein into the lumen of the endoplasmic 
reticulum, the periplasmic space or the extracellular envi- 
ronment. These signals may be endogenous to the polypep- 
tide or they may be heterologous signals. ^0 

If a polypeptide of the present invention is to be expressed 
for use in screening assays, it is generally preferred that the 
polypeptide be produced at the surface of the cell. In this 
event, the cells may be harvested prior to use in the 
screening assay. If the polypeptide is secreted into the '^^ 
medium, the medium can be recovered in order to recover 
and purify the polypeptide. If produced inlracellularly, the 
cells must first be lysed before the polypeptide is recovered. 

Polypeptides of the present invention can be recovered 
and purified from recombinant cell cultures by well-known 
methods including ammonium sulfate or ethanol 
precipitation, acid extraction, anion or cation exchange 
chromatography, phosphocellulose chromatography, hydro- 
phobic interaction chromatography, affinity 
chromatography, hydroxylapatite chromatography and lec- 
tin chromatography. Most preferably, high performance liq- 
uid chromatography is employed for purification. Well 
known techniques for refolding proteins may be employed 
to regenerate active conformation when the polypeptide is 
denatured during isolation and or purification. 

This invention also relates to the use of polynucleotides of 
the present invention as diagnostic reagents. Detection of a 
mutated form of the gene characterised by the polynucle- 
otide of SEQ ID N0:1 which is associated with a dysfunc- 55 
tion will provide a diagnostic tool that can add to, or define, 
a diagnosis of a disease, or susceptibility to a disease, which 
results from under-expression, over-expression or altered 
expression of the gene. Individuals carrying mutations in the 
gene may be detected at the DNA level by a variety of go 
techniques. 

Nucleic acids for diagnosis may be obtained from a 
subject's cells, such as from blood, urine, saUva, tissue 
biopsy or autopsy material. The genomic DNA may be used 
directly for detection or may be amplified enzymatically by 65 
using PCR or other amplification techniques prior to analy- 
sis. RNA or cDNA may also be used in similar fashion. 
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Deletions and insertions can be detected by a change in size 
of the amplified product in comparison to the normal geno- 
type. Point mutations can be identified by hybridizing ampli- 
fied DNA to labeled GPR25 nucleotide sequences. Perfectly 
matched sequences can be distinguished from mismatched 
duplexes by RNase digestion or by differences in melting 
temperatures. DNA sequence differences may also be 
detected by alterations in electrophoretic mobility of DNA 
fragments in gels, with or without denaturing agents, or by 
direct DNA sequencing (ee, e.g., Myers et al.. Science 
(1985)230:1242). Sequence changes at specific locations 
may also be revealed by nuclease protection assays, such as 
RNase and SI protection or the chemical cleavage method 
(see Cotton et al., Proc Natl Acad Set USA (1985) 
85:4397-4401). In another embodiment, an array of oligo- 
nucleotides probes comprising GPR25 nucleotide sequence 
or fragments thereof can be constructed to conduct efficient 
screening of e.g., genetic mutations. Array technology meth- 
ods are well known and have general applicability and can 
be used to address a variety of questions in molecular 
genetics including gene expression, genetic linkage, and 
genetic variability (see for example: M. Chee et al.. Science, 
Vol 274, pp 610-€13 (1996)). 

The diagnostic assays offer a process for diagnosing or 
determining a susceptibility to the Diseases through detec- 
tion of mutation in the GPR25 gene by the methods 
described. In addition, such diseases may be diagnosed by 
methods comprising determining from a sample derived 
from a subject an abnormally decreased or increased level of 
polypeptide or mRNA. Decreased or increased expression 
can be measured at the RNA level using any of the methods 
well known in the art for the quantitation of polynucleotides, 
such as, for example, nucleic acid amplification, for instance 
PCR, RT-PCR, RNase protection, Northern blotting and 
other hybridization methods. Assay techniques that can be 
used to determine levels of a protein, such as a polypeptide 
of the present invention, in a sample derived from a host are 
well-known to those of skill in the art. Such assay methods 
include radioimmunoassays, competitive-binding assays. 
Western Blot analysis and ELISA assays. 

Thus in another aspect, the present invention relates to a 
diagonostic kit which comprises: 

(a) a polynucleotide of the present invention, preferably 
the nucleotide sequence of SEQ ID NO: 1, or a fragment 
thereof; 

(b) a nucleotide sequence complementary to that of (a); 

(c) a polypeptide of the present invention, preferably the 
polypeptide of SEQ ID N0:2 or a fragment thereof; or 

(d) an antibody to a polypeptide of the present invention, 
preferably to the polypeptide of SEQ ID N0:2. 

It will be appreciated that in any such kit, (a), (b), (c) or 
(d) may comprise a substantial component. Such a kit will 
be of use in diagnosing a disease or suspectability to a 
disease, particularly infections such as bacterial, fungal, 
protozoan and viral infections, particularly infections caused 
by HIV-1 or HIV-2; pain; cancers; diabetes, obesity; anor- 
exia; bulimia; asthma; Parkinson's disease; acute heart fail- 
ure; hypotension; hypertension; urinary retention; 
osteoporosis; angina pectoris; myocardial infarction; stroke; 
ulcers; asthma; allergies; benign prostatic hypertrophy; 
migraine; vomiting; psychotic and neurological disorders, 
including anxiety, schizophrenia, manic depression, 
depression, delirium, dementia, and severe mental retarda- 
tion; and dyskinesias, such as Huntington's disease or Gilles 
dela Touretl's syndrome, amongst others. 

The nucleotide sequences of the present invention are also 
valuable for chromosome identification. The sequence is 
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Specifically targeted to, and can hybridize with, a particular 
location on an individual human chromosome. 'Hie mapping 
of relevant sequences to chromosomes according to the 
present invention is an important first step in correlating 
those sequences with gene associated disease. Once a 
sequence has been mapped to a precise chromosomal 
location, the physical position of the sequence on the chro- 
mosome can be correlated with genetic map data. Such data 
are found in, for example, V. McKusick Mendelian Inher- 
itance in Man (available on-line through Johns Hopkins 
University Welch Medical Library). The relationship 
between genes and diseases that have been mapped to the 
same chromosomal region are then identified through link- 
age analysis (coinheritance of physically adjacent genes). 

The differences in the cDNA or genomic sequence 
between affected and unaffected individuals can also be 
determined. If a mutation is observed in some or all of the 
affected individuals but not in any normal individuals, then 
the mutation is likely to be the causative agent of the disease. 

The polypeptides of the invention or their fragments or 
analogs thereof, or cells expressing them, can also be used 
as immunogens to produce antibodies immunospecific for 
polypeptides of the present invention. The term "immuno- 
specific" means that the antibodies have substantially greater 
afi&nity for the polypeptides of the invention than their 
affinity for other related polypeptides in the prior art. 

Antibodies generated against polypeptides of the present 
invention may be obtained by administering the polypep- 
tides or epitope-bearing fragments, analogs or cells to an 
animal, preferably a non-human animal, using routine pro- 
tocols. For preparation of monoclonal antibodies, any tech- 
nique which provides antibodies produced by continuous 
cell line cultures can be used. Examples include the hybri- 
doma technique (Kohler, G. and Milstein, C, Nature (1975) 
256:495-497), the trioma technique, the human B-cell 
hybridoma technique (Kozbor et al.. Immunology Today 
(1983) 4:72) and the EBV-hybridoma technique (Cole et al., 
MONOCLONAL ANTIBODIES AND CANCER 
THERAPY, pp. 77-96, Alan R. Liss, Inc., 1985). 

Techniques for the production of single chain antibodies, 
such as those described in U.S. Pat. No. 4,946,778, can also 
be adapted to produce single chain antibodies to polypep- 
tides of this invention. Also, transgenic mice, or other 
organisms, including other mammals, may be used to 
express humanized antibodies. 

The above-described antibodies may be employed to 
isolate or to identify clones expressing the polypeptide or to 
purify the polypeptides by afiBnity chromatography. 

Antibodies against polypeptides of the present invention 
may also be employed to treat the Diseases, amongst others. 

In a further aspect, the present invention relates to geneti- 
cally engineered soluble fusion proteins comprising a 
polypeptide of the present invention, or a fragment thereof, 
and various portions of the constant regions of heavy or light 
chains of immunoglobulins of various subclasses (IgG, IgM, 
IgA, IgE). Preferred as an immunoglobulins is the constant 
part of the heavy chain of human IgG, particularly IgGl, 
where fusion takes place at the hinge region. In a particular 
embodiment, the Fc part can be removed simply by incor- 
poration of a cleavage sequence which can be cleaved with 
blood clotting factor Xa. Furthermore, this invention relates 
to processes for the preparation of these fusion proteins by 
genetic engineering, and to the use thereof for drug 
screening, diagnosis and therapy. A further aspect of the 
invention also relates to polynucleotides encoding such 
fusion proteins. Examples of fusion protein technology can 
be found in International Patent Application Nos. W094/ 
29458 and W094/22914. 
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Another aspect of the invention relates to a method for 
inducing an immunological response in a mammal which 
comprises inoculating the mammal with a polypeptide of the 
present invention, adequate to produce antibody and/or T 

5 cell immune response to protect said animal from the 
Diseases hereinbefore mentioned, amongst others. Yet 
another aspect of the invention relates to a method of 
inducing immunological response in a mammal which 
comprises, delivering a polypeptide of the present invention 

10 via a vector directing expression of the polynucleotide and 
coding for the polypeptide in vivo in order to induce such an 
inununological response to produce antibody to protect said 
animal from diseases. 
A further aspect of the invention relates to an 

15 immunologicaWaccine formulation (composition) which, 
when introduced into a mammalian host, induces an immu- 
nological response in that mammal to a polypeptide of the 
present invention wherein the composition comprises a 
polypeptide or polynucleotide of the present invention. The 

20 vaccine formulation may further comprise a suitable carrier. 
Since a polypeptide may be broken down in the stomach, it 
is preferably administered parenterally (for instance, 
subcutaneous, intramuscular, intravenous, or intradermal 
injection). Formulations suitable for parenteral administra- 

25 tion include aqueous and non-aqueous sterile injection solu- 
tions which may contain anli-oxidants, buffers, bacteriostats 
and solutes which render the formulation instonic with the 
blood of the recipient; and aqueous and non-aqueous sterile 
suspensions which may include suspending agents or thick- 

30 ening agents. The formulations may be presented in unit- 
dose or multi-dose containers, for example, sealed ampoules 
and vials and may be stored in a freeze-dried condition 
requiring only the addition of the sterile liquid carrier 
immediately prior to use. The vaccine formulation may also 

35 include adjuvant systems for enhancing the immunogenicity 
of the formulation, such as oil-in water systems and other 
systems known in the art. The dosage will depend on the 
specific activity of the vaccine and can be readily deter- 
mined by routme experimentation. 

40 Polypeptides of the present invention are responsible for 
many biological functions, including many disease states, in 
particular the Diseases hereinbefore mentioned. It is there- 
fore desirous to devise screening methods to identify com- 
pounds which stimulate or which inhibit the function of the 

45 polypeptide. Accordingly, in a further aspect, the present 
invention provides for a method of screening compounds to 
identify those which stimulate or which inhibit the function 
of the polypeptide. In general, agonists or antagonists may 
be employed for therapeutic and prophylactic purposes for 

50 such Diseases as hereinbefore mentioned. Compounds may 
be identified from a variety of sources, for example, cells, 
cell-free preparations, chemical libraries, and natural prod- 
uct mixtures. Such agonists, antagonists or inhibitors 
so-identified may be naniral or modified substrates, ligands, 

55 receptors, enzymes, etc., as the case may be, of the polypep- 
tide; or may be structural or functional mimetics thereof (see 
Coligan et al.. Current Protocols in Immunology 
l(2):Chapter 5 (1991)). 
The screening method may simply measure the binding of 

60 a candidate compound to the polypeptide, or to cells or 
membranes bearing the polypeptide, or a fusion protein 
thereof by means of a label directly or indirectly associated 
with the candidate compound. Alternatively, the screening 
method may involve competition with a labeled competitor. 

65 Further, these screening methods may test whether the 
candidate compoimd results in a signal generated by acti- 
vation or inhibition of the polypeptide, using detection 
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systems appropriate to the cells bearing the polypeptide. It will be appreciated that in any such kit, (a), (b), (c) or 

Inhibitors of activation are generally assayed in the presence (d) may comprise a substantial component. 

of a known agonist and the effect on activation by the It will be readily appreciated by the skilled artisan that a 

agonist by the presence of the candidate compound is polypeptide of the present invention may also be used in a 

observed. Constitutively active polpypcptides may be 5 method for the structure-based design of an agonist, antago- 

employed in screening methods for inverse agonists or nist or inhibitor of the polypeptide, by: 

inhibitors, in the absence of an agonist or inhibitor, by (a) determining in the first instance the three-dimensional 

testing whether the candidate compound results in inhibition structure of the polypeptide; 

of activation of the polypeptide. Further, the screening (b) deducing the three-dimensional structure for the hkely 

methods may simply comprise the steps of mixing a candi- reactive or binding sile(s) of an agonist, antagonist or 

date compound with a solution containing a polypeptide of inhibitor; 

the present invention, to form a mixture, measuring GPR25 (c) synthesing candidate compounds that are predicted to 

activity in the mixture, and comparing the GPR25 activity of ^ind to or react with the deduced binding or reactive 

the mixture to a standard. Fusion proteins, such as those gjj^. 

made from Fc ponion and GPR25 polypeptide, as bereio- ,5 j^^j^ ^^^^idate compounds are indeed 

before described, can also be used for high-throughput ^ j^^^ antagonists or inhibitors, 

screening assays to identify antagomsts for the polypeptide jj ^^ ^^^^ appreciated that this wiU normally be an 

of the present invention (see D. Bennett et al., J Mol iterative process. 

5f «'S«u»ion. 8;52-58 (1995); and K. Johanson et al., J Biol ,„ , f^^^er aspect, the present invention provides meth- 
Chem, 270(16):9459-9471 (1995)). 20 ods of treating abnormal conditions such as, for instance, 

The po ynucleoudes, polypeptides and antibodies to the infections such as bacterial, fiingal, protozoan and viral 

polypeptide of the present mvenUon may also be used to infections, particularly infections caused by HIV-1 or HIV- 

configure screenmg methods for detecting the effect of 3; pain; cancers; diabetes, obesity; anorexia; bulimia; 

added compounds on the production of mRNA and polypep- ^^j^^. pa^fcinson's disease; acute heart failure; hypoten- 

Ude m cells. For example, an ELISA assay may be con- hypertension; urinary retention; osteoporosis; angina 

stnicted for measuring secreted or ceU associated levels of „ ^cardial infarction; stroke; ulcers; asthma; aller- 

polypeptide using monoclonal and polyclonal antibodies by benign prostatic hypertrophy; migraine; vomiting; psy- 

standard methods known m the art. This can be used to ^^„,i^ neurological disorders, including anxiety, 

dKcover agents which may inhibit or enhance the producuon schizophrenia, manic depression, depression, delirium, 
of polypeptide (also called antagonist or agonist. 3^ dementia, and severe mental retardation; and dyskinesias, 

respectively) from suitably mampulated cells or tissues. ^^^j, Huntington's disease or GUles dela Tourett's 

TTie polypeptide may be used to Identify membrane tound ^ „d„„e, related to either an excess of, or an under- 
or soluble receptors, if any through standard receptor bind- expression of. GPR25 polypeptide activity 
ing techniques known in the art. These include, but are not ^^ ^^^^ polypeptide is in excess, several 
hmited to. hgand binding and crosslinkmg assays in which 35 ^es are available. One approach comprises admin- 
the polypepude IS labeled with a radioacUve isotope (for j^.^^ ^ ^„ compound 
instance, I), chemically modified (for instance, (antagonist) as hereinabove described. optionaUy in combi- 
biotinylated), or fiised to a peptide a^uence suitable for ^^^^ ^ pharmaceutically acceptable carrier, in an 
detection or punfication. and incubated with a source of the ^^^^.^^ ^^^^^ „f polypeptide, 
putative receptor (cells ceU membranes, cell supematanls, ^ ^^^^ example, by blocking the binding of ligands, 
tissue extracts, bodily fluids). Other methods mclude bio- substrates, receptors, enzymes, etc., or by inhibiting a sec- 
physical techniques such as surface plasmon resonance and ^. ^^ j^^^^^y alleviating the abnormal condition, 
spectroscopy, -niesescreemngmethocb may also be used to approach, soluble forms of the polypeptides still 
Identify agonists and antagonists of the polypepUde which ^ ^le of binding the ligand, substrate, enzymes, receptors, 
aimpete with the binding of the polypeptide to Its reoeptois competition with endogenous polypeptide may be 
If any. Standard methods for conducting such assays are well administered. Typical examples of such competitors include 
understood m the art. . fragments of the GPR25 polypeptide. 

atamples of potential polypeptide antagonists include ^ ^^^^ expression of the gene encoding 

antibodies or, in some cases, ohgonucleotides or protems endogenous GPR25 polypeptide can be inhibited using 

which are closely related to the ligands, substrates, expression blocking techniques. Known such techniques 

receptors enzymes, etc., as the case may be. of the j^^^,^^ ^ ^^.^^^ sequences, either intemaUy 

polypeptide, e.g., a fragment of the ligands substrates, e^^ted or separately administered (see, for example, 

receptors, enzymes, etc.; or small molecules which bind to o'ConaoT. J Neurochem (1991) 56:560 in Oligodeoxynucle- 

the polypeptide of the present invention but do not ehcit a ^s Antisense Inhibitors of Gene Expression. CRC 

response, so that the activity of the polypepude is prevented. p^^^^ f^ p,^ (^^gg^) Alternatively, oligonucle- 

Thus. in another aspect, the present invention relates to a ^^^^ ^^ ^^ t.^^^^ ^^^^ be 

screening kit for identifying agonists, antagonists ligands. ^^^^ ^ i ^^ ^^ ^^.^ 

receptors, substrates, enzymes etc for polypepudes of the (^575^ ^ooney et al., Science (1988)241:456; Der- 

present mvention; or compounds which decrease or enhance ^^n et al..5c/e«ce(1991)251:1360). These oligomers can be 

the production of such polypeptides, which composes: administered per se or the relevant oligomers can be 

(a) a polypeptide of the present invention; expressed in vivo. 

(b) a recombinant ceU expressing a polypeptide of the por treating abnormal conditions related to an under- 
present invention; expression of GPR25 and its activity, several approaches are 

(c) a cell membrane expressing a polypeptide of the also available. One approach comprises administering to a 
present invention; or 85 subject a therapeutically effective amount of a compound 

(d) antibody to a polypeptide of the present invention; which activates a polypeptide of the present invention, i.e.. 
which polypeptide is preferably that of SEQ ID N0:2. an agonist as described above, in combination with a phar- 
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maceutically acceptable carrier, to thereby alleviate the 
abnormal condition. Aiieraatively, gene therapy may be 
employed to effect the endogenous production of GPR25 by 
the relevant cells in the subject. For example, a polynucle- 
otide of the invention may be engineered for expression in 5 
a replication defective retroviral vector, as discussed above. 
The retroviral expression construct may then be isolated and 
introduced into a packaging cell transduced with a retroviral 
plasmid vector containing RNA encoding a polypeptide of 
the present invention such that the packaging cell now lo 
produces infectious viral particles containing the gene of 
interest. These producer cells may be administered to a 
subject for engineering cells in vivo and expression of the 
polypeptide in vivo. For an overview of gene therapy, see 
Chapter 20, Gene Therapy and other Molecular Genetic- 15 
based Therapeutic Approaches^ (and references cited 
therein) in Human Molecular Genetics, T Strachan and A P 
Read, BIOS Scientific Publishers Ltd (1996). Another 
approach is to administer a therapeutic amount of a polypep- 
tide of the present invention in combination with a suitable 20 
pharmaceutical carrier. 

In a further aspect, the present invention provides for 
pharmaceutical compositions comprising a therapeutically 
effective amount of a polypeptide, such as the soluble form 
of a polypeptide of the present invention, agonist/antagonist 25 
peptide or small molecule compound, in combination with a 
pharmaceutically acceptable carrier or excipient. Such car- 
riers include, but are not limited to, saline, buffered saline, 
dextrose, water, glycerol, ethanol, and combinations thereof. 
The invention further relates to pharmaceutical packs and 30 
kits comprising one or more containers filled with one or 
more of the ingredients of the aforementioned compositions 
of the invention. Polypeptides and other compounds of the 
present invention may be employed alone or in conjunction 
with other compounds, such as therapeutic compounds. 35 

The composition will be adapted to the route of 
administration, for instance by a systemic or an oral route. 
Preferred forms of systemic administration include 
injection, typically by intravenous injection. Other injection 
routes, such as subcutaneous, intramuscular, or 40 
intraperitoneal, can be used. Alternative means for systemic 
administration include transmucosal and transdermal admin- 
istration using penetrants such as bile salts or fusidic acids 
or other detergents. In addition, if a polypeptide or other 
compounds of the present invention can be formulated in an 45 
enteric or an encapsulated formulation, oral administration 
may also be possible. Administration of these compounds 
may also be topical and/or localized, in the form of salves, 
pastes, gels, and the like. 

The dosage range required depends on the choice of 50 
peptide or other compounds of the present invention, the 
route of administration, the nature of the formulation, the 
nature of the subject's condition, and the judgment of the 
attending practitioner. Suitable dosages, however; are in the 
range of 0.1-100 //g/kg of subject. Wide variations in the 55 
needed dosage, however, are to be expected in view of the 
variety of compounds available and the differing efficiencies 
of various routes of administration. For example, oral 
administration would be expected to require higher dosages 
than administration by intravenous injection. Variations in 60 
these dosage levels can be adjusted using standard empirical 
routines for optimization, as is well understood in the art. 

Polypeptides used in treatment can also be generated 
endogenously in the subject, in treatment modahties often 
referred to as "gene therapy" as described above. Thus, for 65 
example, cells from a subject may be engineered with a 
polynucleotide, such as a DNA or RNA, to encode a 
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polypeptide ex vivo, and for example, by the use of a 
retroviral plasmid vector. The cells are then introduced into 
the subject. 

Polynucleotide and polypeptide sequences form a valu- 
able information resource with which to identify further 
sequences of similar homology. This is most easily facili- 
tated by storing the sequence in a computer readable 
medium and then using the stored data to search a sequence 
database using well known searching tools, such as GCC. 
Accordingly, in a further aspect, the present invention pro- 
vides for a computer readable medium having stored thereon 
a polynucleotide comprising the sequence of SEQ ID N0:1 
and/or a polypeptide sequence encoded thereby. 

The following definitions are provided to facilitate under- 
standing of certain terms used frequently hereinbefore. 

"Antibodies" as used herein includes polyclonal and 
monoclonal antibodies, chimeric, single chain, and human- 
ized antibodies, as well as Fab fragments, including the 
products of an Fab or other immunoglobuhn expression 
hbrary. 

"Isolated" means altered "by the hand of man" from the 
natural state. If an "isolated" composition or substance 
occurs in nature, it has been changed or removed from its 
original environment, or both. For example, a polynucle- 
otide or a polypeptide naturally present in a living animal is 
not "isolated," but the same polynucleotide or polypeptide 
separated from the coexisting materials of its natural state is 
"isolated", as the term is employed herein. 

"Polynucleotide" generally refers lo any polyribonucle- 
otide or polydeoxribonucleotide, which may be unmodified 
RNA or DNA or modified RNA or DNA. "Polynucleotides" 
include, without limitation, single- and double-stranded 
DNA, DNA that is a mixture of single- and double-stranded 
regions, single- and double-stranded RNA, and RNA that is 
mixture of single- and double-stranded regions, hybrid mol- 
ecules comprising DNA and RNA that may be single- 
stranded or, more typically, double-stranded or a mixture of 
single- and double-stranded regions. In addition, "poly- 
nucleotide" refers to triple-stranded regions comprising 
RNA or DNA or both RNA and DNA. The term "polynucle- 
otide" also includes DNAs or RNAs containing one or more 
modified bases and DNAs or RNAs with backbones modi- 
fied for stability or for other reasons. "Modified" bases 
include, for example, tritylated bases and unusual bases such 
as inosine. A variety of modifications may be made to DNA 
and RNA; thus, "polynucleotide" embraces chemically, 
enzymatically or metabolically modified forms of poly- 
nucleotides as typically found in nature, as well as the 
chemical forms of DNA and RNA characteristic of viruses 
and cells. "Polynucleotide" also embraces relatively short 
polynucleotides, often referred to as oligonucleotides. 

"Polypeptide" refers to any peptide or protein comprising 
two or more amino acids joined to each other by peptide 
bonds or modified peptide bonds, i.e., peptide isosteres. 
"Polypeptide" refers to both short chains, commonly 
referred to as peptides, oligopeptides or oligomers, and to 
longer chains, generally referred to as proteins. Polypeptides 
may contain amino acids other than the 20 gene-encoded 
amino acids. "Polypeptides" include amino acid sequences 
modified either by natural processes, such as post- 
translational processing, or by chemical modification tech- 
niques which are well known in the art. Such modifications 
are well described in basic texts and in more detailed 
monographs, as well as in a voluminous research literature. 
Modifications may occur anywhere in a polypeptide, includ- 
ing the peptide backbone, the amino acid side-chains and the 
amino or carboxyl termini. It will be appreciated that the 
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same type of modification may be present to the same or 
varying degrees at several sites in a given polypeptide. Also, 
a given polypeptide may contain many types of modifica- 
tions. Polypeptides may be branched as a result of 
ubiquitination, and they may be cyclic, with or without 
branching. Cyclic, branched and branched cyclic polypep- 
tides may result from post- translation natural processes or 
may be made by synthetic methods. Modifications include 
acetylation, acylation, ADP-ribosylation, amidation, cova- 
lent attachment of flavin, covalent attachment of a heme 
moiety, covalent attachment of a nucleotide or nucleotide 
derivative, covalent attachment of a lipid or lipid derivative, 
covalent attachment of phosphotidylinositol, cross-linking, 
cyclization, disulfide bond formation, demethylation, for- 
mation of covalent cross-links, formation of cystine, forma- 
tion of pyroglutamate, formylaiion, gamma<arboxylation, 
glycosylation, GPI anchor formation, hydroxylation, 
iodination, methylation, myristoylation, oxidation, pro- 
teolytic processing, phosphorylation, prenylation, 
racemization, selenoylation, sulfation, transfer-RNA medi- 
ated addition of amino acids to proteins such as arginylation, 
and ubiquitination (see, for instance, PROTEINS — 
STRUCTURE AND MOLECULAR PROPERTIES, 2nd 
Ed., T E. Creighton, W. H. Freeman and Company, New 
York, 1993; Wold, F., Post-translational Protein Modifica- 
tions: Perspectives and Prospects, pgs. 1-12 in POST- 
TRANSLATIONAL COVALENT MODIFICATION OF 
PROTEINS, B. C. Johnson, Ed., Academic Press, New 
York, 1983; Seifter et al., "Analysis for protein modifica- 
tions and nonprotein co factors", Meth Emymol (1990) 
182:626-646 and Rattan et al., "Protein Synthesis: Post- 
translational Modifications and Aging", Ann NY Acad Set 
(1992) 663:48-62). 

"Variant" refers to a polynucleotide or polypeptide that 
differs from a reference polynucleotide or polypeptide, but 
retains essential properties. A typical variant of a polynucle- 
otide differs in nucleotide sequence from another, reference 
polynucleotide. Changes in the nucleotide sequence of the 
variant may or may not alter the amino acid sequence of a 
polypeptide encoded by the reference polynucleotide. 
Nucleotide changes may result in amino acid substitutions, 
additions, deletions, fusions and truncations in the polypep- 
tide encoded by the reference sequence, as discussed below. 
A typical variant of a polypeptide differs in amino acid 
sequence from another, reference polypeptide. Generally, 
differences are limited so that the sequences of the reference 
polypeptide and the variant are closely similar overall and, 
in many regions, identical. A variant and reference polypep- 
tide may differ in amino acid sequence by one or more 
substitutions, additions, deletions in any combination. A 
substituted or inserted amino acid residue may or may not be 
one encoded by the genetic code. A variant of a polynucle- 
otide or polypeptide may be a naturally occurring such as an 
allelic variant, or it may be a variant that is not known to 
occur naturally. Non-naturally occurring variants of poly- 
nucleotides and polypeptides may be made by mutagenesis 
techniques or by direct synthesis. 

"Identity" is a measure of the identity of nucleotide 
sequences or amino acid sequences. In general, the 
sequences are aligned so that the highest order match is 
obtained. "Identity" per se has an art-recognized meaning 
and can be calculated using published techniques (sec, e.g.: 
COMPUTAHONAL MOLECULAR BIOLOGY, Lesk. A. 
M., ed., Oxford University Press, New York, 1988; BIO- 
COMPUTING: INFORMATICS AND GENOME 
PROJECTS, Smith, D. W., ed.. Academic Press, New York, 
1993; COMPUTCR ANALYSIS OF SEQUENCE DATA, 
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PART 1, GrifiBn, A. M., and Griffin, H. G., eds., Humana 
Press, New Jersey, 1994; SEQUENCE ANALYSIS IN 
MOLECULAR BIOLOGY, von Heinje, G., Academic 
Press, 1987; and SEQUENCE ANALYSIS PRIMER, 

5 Gribskov, M. and Devereux, J., eds., M Stockton Press, New 
York, 1991). While there exist a nmnber of methods to 
measure identity between two polynucleotide or polypeptide 
sequences, the term "identity** is well known to skilled 
artisans (Carillo, H., and Lipton, D., SIAM J Applied Math 

10 (1988) 48:1073). Methods commonly employed to deter- 
mine identity or similarity between two sequences include, 
but are not limited to, those disclosed in Guide to Huge 
Computers, Martin J. Bishop, ed.. Academic Press, San 
Diego, 1994, and Carillo, H., and Lipton, D., SIAM J 

15 Applied Math (1988) 48:1073. Methods to determine iden- 
tity and similarity are codified in computer programs. Pre- 
ferred computer program methods to determine identity and 
similarity between two sequences include, but are not lim- 
ited to, GCG program package (Devereux, J., et z\.. Nucleic 

20 Acids Research (1984) 12(1):387), BLASTP. BLASTN, and 
FASTA(Atschul, S. F. et zlJMolecBiol (1990) 215:403). 

By way of example, a polynucleotide sequence of the 
present invention may be identical to the reference sequence 
of SEQ ID N0:1, that is be 100% identical, or it may include 

25 up to a certain integer number of nucleotide alterations as 
compared to the reference sequence. Such alterations are 
selected from the group consisting of at least one nucleotide 
deletion, substitution, including transition and transversion, 
or insertion, and wherein said alterations may occiu: at the 5' 

30 or 3' terminal positions of the reference nucleotide sequence 
or anywhere between those terminal positions, interspersed 
either individually among the nucleotides in the reference 
sequence or in one or more contiguous groups within the 
reference sequence. The number of nucleotide alterations is 

35 determined by multiplying the total number of nucleotides in 
SEQ ID N0:1 by the numerical percent of the respective 
percent identity (divided by 100) and subtracting that prod- 
uct from said total number of nucleotides in SEQ ID N0:1, 
or: 

wherein n„ is the number of nucleotide alterations, x„ is the 
total number of nucleotides in SEQ ID N0:1, and y is 0.50 
for 50%, 0.60 for 60%, 0.70 for 70%, 0.80 for 80%, 0.85 for 

45 85%, 0.90 for 90%, 0.95 for 95%, 0.97 for 97% or 1.00 for 
100%, and wherein any non-integer product of x„ and y is 
rounded down to the nearest integer prior to subtracting it 
from x„. Alterations of a polynucleotide sequence encoding 
the polypeptide of SEQ ID N0:2 may create nonsense, 

50 missense or frameshift mutations in this coding sequence 
and thereby alter the polypeptide encoded by the polynucle- 
otide following such alterations. 

Similarly, a polypeptide sequence of the present invention 
maybe identical to the reference sequence of SEQ ID N0:2, 

55 that is be 100% identical, or it may include up to a certain 
integer number of amino acid alterations as compared to the 
reference sequence such that the % identity is less than 
100%. Such alterations are selected from the group consist- 
ing of at least one amino acid deletion, substitution, includ- 

60 ing conservative and non-conservative substitution, or 
insertion, and wherein said alterations may occur at the 
amino- or carboxy-terminal positions of the reference 
polypeptide sequence or anywhere between those terminal 
positions, interspersed either individually among the amino 

65 acids in the reference sequence or in one or more contiguous 
groups within the reference sequence. The number of amino 
acid alterations for a given % identity is determined by 
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multiplying the total number of amino acids in SEQ ID throughput format. The purified ligand for a receptor is 

N0:2 by the numerical percent of the respective percent radiolabeled to high specific activity (50-2000 Ci/mmol) for 

identity (divided by 100) and then subtracting that product binding studies. A determination is then made that the 

from said total number of amino acids in SEQ ID N0:2, or: process of radiolabeling does not diminish the activity of the 
rt ^ -(r yX 5 ligand towards its receptor. Assay conditions for buffers, 

wherein'n> the number of amino acid alterations, is the f ^ f ^^.^ ^'^'^TZ '"""^ w ^^^l^^^es are 

total number of amino acids in SEQ ID N0:2, and y is, for optimized to estabbsh a workable signal to noise ratio for 

instance 0.70 for 70%, 0.80 for 80%, 0.85 for 85% etc., and ^""^^ membrane and whole ceU receptor sources. For these 

wherein any non-integer product of and y is rounded assays, specific receptor bmdmg is defined as total associ- 

down to the nearest integer prior to subtracting it from x„. ated radioactivity minus the radioactivity measured m the 

"Fusion protein" refers to a protein encoded by two, often presence of an excess of unlabeled competing ligand. Where 

unrelated, fused genes or fragments thereof. In one example, possible, more than one competing ligand is used to define 

EP-A-0 464 discloses fusion proteins comprising various residual nonspecific binding, 
portions of constant region of immunoglobulin molecules 

U)gether with another human protein or part thereof. In many 15 Example 4 

cases, employing an immunoglobulin Fc region as a part of 

a fusion protein is advantageous for use in therapy and Functional Assay in Xenopus Oocytes 

diagnosis resulting in, for example^^ improved pharmacoti ^ transcripts from linearized plasmid tem- 

netic properties [see, e.g., EP-A 0232 262]. On the other pj^tes Tncoding the receptor cDNAs of the invention are 
hand, for some uses it would be desirable to be able to delete 20 synthesized in vitro with RNA polymerases in accordance 
the Fc part after the fusion protein has been expressed, standard procedures. In vitro transcripts are suspended 

detected and purified. in water at a final concentration of 0.2 rag/ml. Ovarian lobes 

All publications, including but not limited to patents and are removed from adult female toads. Stage V defolliculated 

patent applications, cited in this specification are herein oocytes are obtained, and RNA transcripts (10 ng/oocyte) 
incorporated by reference as if each individual publication 25 are injected in a 50 nl bolus using a microinjection appara- 

were specifically and individually indicated to be incorpo- tus. Two electrode voltage clamps are used to measure the 

rated by reference herein as though fully set forth. currents from individual Xenopus oocytes in response to 

EXAMPLES agonist exposure. Recordings are made in Ca2+ free Barth*s 

medium at room temperature. The Xenopus system can be 
Example 1 30 used to screen known ligands and tissue/cell extracts for 

Mammalian Cell Expression activating figands. 

'Ilie receptors of the present invention are expressed in Example 5 

either human embryonic kidney 293 (HEK293) cells or 

adherent dhfr CHO cells. To maximize receptor expression, Microphysiometric Assays 

typically all 5' and 3' untranslated regions (UTRs) are * t a ■ . c a 

•"^ J - . r^xTA T • \' • . Activation of a wide variety of secondary messenger 

removed from the receptor cDNA prior to insertion mto a . . r n , c -a c 

^T^xT ^T^KT ^ fi L systcms results m extrusion or small amounts of acid from 

pCDN or PCDNA3 vector. The cells are transfected with /^^^ ^^.^ ^^^^^ j j ^ ^^^^U increased 

mdividual receptor cDNAs by hpofectm and selected in the n^etabolic activity required to fuel the intraceUular signaling 

presence of 400 mg/ml G418. After 3 weeks of selecUon, p^cess. The pH changes in the media surrounding the cell 

mdividual clones are picked and expanded for further analy- ^re very small but are detectable by the CYTOSENSOR 

sis. HEK293 or CHO cells transfected with the vector alone microphysiometer (Molecular Devices Ltd., Menlo Park, 

serve as negative controls. To isolate ceU lines stably Calif.). The CYTOSENSOR is thus capable of detecting the 

expressing the individual receptors, about 24 clones are activation of a receptor which is coupled to an energy 

typically selected and analyzed by Northern blot analysis. utilizing intracellular signaling pathway such as the 

Receptor raRNAs are generally detectable in about 50% of G-protein coupled receptor of the present invention, 
the G418-resistant clones analyzed. 



Example 2 



Example 6 



Ligand bank for binding and functional assays Extract/Cell Supernatant Screening 

A bank of over 200 putative receptor ligands has been ^ i^^ge number of mammaUan receptors exist for which 

assembled for screening. The bank comprises: transmitters, there remains, as yet, no cognate activating ligand (agonist), 

hormones and chemokines known to act via a human seven Thus, active ligands for these receptors may not be included 

transmembrane (7TM) receptor; naturally occurring com- within the ligands banks as identified to date. Accordingly, 
pounds which may be putative agonists for a human 71^ 55 the 7TM receptor of the invention is also functionally 

receptor, non-mammalian, biologically active peptides for screened (using calcium, cAMP, microphysiometer, oocyte 

which a mammalian counterpart has not yet been identified; electrophysiology, etc., functional screens) against tissue 

and compounds not found in nature, but which activate 7TM extracts to identify natural ligands. Extracts that produce 

receptors with unknown natural ligands. This bank is used to positive functional responses can be sequentially subfirac- 

initially screen the receptor for known ligands, using both 59 ^io°^^^^ ^ activating ligand is isolated and identified, 
functional (i.e . calcium, cAMP, microphysiometer, oocyte 

electrophysiology, etc, sec below) as well as binding assays. Example 8 

Example 3 Calcium and cAMP Functional Assays 

Ligand Bmdmg Assays receptors which are expressed in HEK 293 cells 

Ligand binding assays provide a direct method for ascer- have been shown to be coupled functionally to activation of 

taining receptor pharmacology and are adaptable to a high PLC and calcium mobilization and/or cAMP stimulation or 



• 
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inhibition. Basal calcium levels in the HEK 293 cells in 
receptor-transfected or vector control cells were observed to 
be in the normal, 100 nM to 200 nM, range. HEK 293 cells 
expressing recombinant receptors are loaded with fura 2 and 
in a single day > 150 selected Ugands or tissue/cell extracts 
are evaluated for agonist induced calcium mobilization. 
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Similarly, HEK 293 cells expressing recombinant receptors 
are evaluated for the stimulation or inhibition of cAMP 
production using standard cAMP quantitation assays. Ago- 
nists presenting a calcium transient or cAMP flucuation are 
tested in vector control cells to determine if the response is 
unique to the transfected cells expressing receptor. 



SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(iii) NUMBER OF SEQUENCES: 2 

(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1174 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

AGAGCAAACC CCCTCCTGCT CAGAGCTGCT GCCGCCTGCG CCCAGGGCTG CACTCCGCGC 60 

AGGCCTCATA GCCAGGCCAT GGCCCCCACA GAGCCCTGGA GCCCCAGCCC GGGGTCAGCG 120 

CCCTGGGACT ACTCGGGGTT GGACGGCCTG GAGGAGCTGG AGCTGTGTCC GGCCGGGGAC 180 

CTGCCCTACG GCTACGTCTA CATCCCCGCG CTCTACCTGG CGGCCTTCGC CGTGGGCCTG 240 

CTGGGCAACG CCTTTGTGGT GTGGCTGCTG GCCGGGCGGC GGGGCCCGCG GCGGCTGGTG 300 

GATACCTTCG TGCTGCACCT GGCGGCAGCT GACCTGGGCT TCGTGCTCAC GCTGCCGCTG 360 

TGGGCCGCGG CGGCGGCGCT AGGCGGCCGC TGGCCGTTCG GCGATGGCCT CTGCAAGCTC 420 

AGCAGCTTCG CGCTGGCGGG CACGCGCTGC GCGGGCGCGC TGCTGCTGGC GGGCATGAGC 480 

GTGGACCGCT ACCTGGCCGT GGTGAAGCTG CTCGAGGCGA GGCCACTGCG CACCCCGCGC 540 

TGCGCGCTGG CCTCGTGCTG CGGCGTCTGG GCCGTGGCGC TGCTGGCCGG CCTGCCCTCC 600 

CTGGTCTACC GGGGGTTGCA GCCCCTGCCT GGGGGCCAGG ACAGCCAGTG CGGCGAGGAG 660 

CCCTCCCACG CCTTCCAGGG CCTCAGCTTG CTGCTGCTGC TGCTGACCTT CGTGCTGCCC 720 

CTGGTCGTCA CCCTCTTCTG CTACTGCCGC ATCTCGCGCC GCCTGCGACG GCCGCCGCAC 7 80 

GTGGGTCGGG CCCGGAGGAA CTCGCTGCGC ATCATCTTCG CCATCGAGAG CACGTTTGTG 840 

GGCTCCTGGC TGCCCTTCAG CGCCCTGCGG GCCGTCTTCC ACCTGGCGCG TCTGGGGGCG 900 

CTGCCGCTGC CGTGCCCCCT GCTGCTGGCG CTGCGCTGGG GCCTCACCAT TGCCACCTGC 960 

CTGGCCTTCG TCAACAGCTG CGCCAACCCG CTCATCTACC TCCTGCTGGA CCGCTCATTC 1020 

CGAGCCCGGG CGCTGGACGG GGCCTGCGGG CGCACCGGCC GCCTGGCGCG AAGGATCAGC 1080 

TCAGCCTCCT CGCTCTCCAG GGACGACAGT TCCGTGTTCC GTTGCCGGGC CCAGGCCGCG 1140 

AACACTGCCT CGGCCTCCTG GTAGAAGCTT CGGG 1174 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 361 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



6,043,052 

21 22 



-continued 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 


;2: 












Met 
1 


Ala 


Pro 


Thr 


Glu 
5 


Pro 


Trp 


Ser 


Pro 


Ser 
10 


Pro 


Gly 


Ser 


Ala 


Pro 
15 


Trp 


Aep 


Tyr 


Ser 


Gly 
20 


Leu 


Asp 


Gly 


Leu 


Glu 
25 


Glu 


Leu 


Glu 


Leu 


Cys 
30 


Pro 


Ala 


Gly 


Asp 


Leu 
35 


Pro 


Tyr 


Gly 


Tyr 


Val 
40 


Tyr 


He 


Pro 


Ala 


Leu 
45 


Tyr 


Leu 


Ala 


Ala 


Phe 
50 


Ala 


Val 


Gly 


Leu 


Leu 
55 


Gly 


Asn 


Ala 


Phe 


Val 
60 


Val 


Trp 


Leu 


Leu 


Ala 
65 


Gly 


Arg 


Arg 


Gly 


Pro 
70 


Arg 


Arg 


Leu 


Val 


Asp 
75 


Thr 


Phe 


Val 


Leu 


His 
80 


Leu 


Ala 


Ala 


Ala 


Asp 
85 


Leu 


Gly 


Phe 


Val 


Leu 
90 


Thr 


Leu 


Pro 


Leu 


Trp 
95 


Ala 


Ala 


Ala 


Ala 


Ala 

100 


Leu 


Gly 


Gly 


Arg 


Trp 
105 


Pro 


Phe 


Gly 


Asp 


Gly 
110 


Leu 


Cys 


Lys 


Leu 


Ser 
115 


Ser 


Phe 


Ala 


Leu 


Ala 
120 


Gly 


Thr 


Arg 


Cys 


Ala 
125 


Gly 


Ala 


Leu 


Leu 


Leu 
130 


Ala 


Gly 


Met 


Ser 


Val 
135 


Asp 


Arg 


Tyr 


Leu 


Ala 
140 


Val 


Val 


Lys 


Leu 


Leu 
145 


Glu 


Ala 


Arg 


Pro 


Leu 
150 


Arg 


Thr 


Pro 


Arg 


Cys 
155 


Ala 


Leu 


Ala 


Ser 


Cys 
160 


Cys 


Gly 


Val 


Trp 


Ala 
165 


Val 


Ala 


Leu 


Leu 


Ala 
170 


Gly 


Leu 


Pro 


Ser 


Leu 
175 


Val 


Tyr 


Arg 


Gly 


Leu 
180 


Gin 


Pro 


Leu 


Pro 


Gly 
185 


Gly 


Gin 


Asp 


Ser 


Gin 
190 


Cys 


Gly 


Glu 


Glu 


Pro 
195 


Ser 


His 


Ala 


Phe 


Gin 
200 


Gly 


Leu 


Ser 


Leu 


Leu 
205 


Leu 


Leu 


Leu 


Leu 


Thr 
210 


Phe 


Val 


Leu 


Pro 


Leu 
215 


Val 


Val 


Thr 


Leu 


Phe 
220 


Cys 


Tyr 


Cys 


Arg 


He 
225 


Ser 


Arg 


Arg 


Leu 


Arg 
230 


Arg 


Pro 


Pro 


His 


Val 
235 


Gly 


Arg 


Ala 


Arg 


Arg 
240 


Asn 


Ser 


Leu 


Arg 


lie 
245 


He 


Phe 


Ala 


He 


Glu 
250 


Ser 


Thr 


Phe 


Val 


Gly 
255 


Ser 


Trp 


Leu 


Pro 


Phe 
260 


Ser 


Ala 


Leu 


Arg 


Ala 
265 


Val 


Phe 


His 


Leu 


Ala 
270 


Arg 


Leu 


Gly 


Ala 


275 








Cys 


280 






Leu 


Ala 


Leu 
285 


Arg 


Trp 


Gly 


Leu 


Thr 
290 


He 


Ala 


Thr 


Cys 


Leu 
295 


Ala 


Phe 


Val 


Asn 


Ser 
300 


Cys 


Ala 


Asn 


Pro 


Leu 
305 


He 


Tyr 


Leu 


Leu 


Leu 
310 


Asp 


Arg 


Ser 


Phe 


Arg 
315 


Ala 


Arg 


Ala 


Leu 


Asp 
320 


Gly 


Ala 


Cys 


Gly 


Arg 
325 


Thr 


Gly 


Arg 


Leu 


Ala 
330 


Arg 


Arg 


He 


Ser 


Ser 
335 


Ala 


Ser 


Ser 


Leu 


Ser 
340 


Arg 


Asp 


Asp 


Ser 


Ser 
345 


Val 


Phe 


Arg 


Cys 


Arg 
350 


Ala 


Gin 


Ala 


Ala 


Asn 
355 


Thr 


Ala 


Ser 


Ala 


Ser 
360 


Trp 

















What is claimed is: 

1. An isolated polynucleotide comprising a nucleotide 
sequence encoding the polypeptide comprising the amino 
acid sequence as set forth in SEQ ID N0:2. 65 

2. The isolated polynucleotide of claim 1 comprising the 
polynucleotide as set forth in SEQ ID NO:l. 



3. The isolated polynucleotide of claim 1 comprising 
nucleotides 79 to 1161 of the polynucleotide as set forth in 
SEQ ID N0:1. 

4. An isolated expression vector comprising a polynucle- 
otide encoding a polypeptide having the amino acid 
sequence as set forth in SEQ ID NO:2. 
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5. A process for producing a recombinant host cell com- 
prising transforming or iransfecting host a cell with the 
expression vector of claim 4 such that the host cell, under 
appropriate culture conditions, produces a polypeptide com- 
prising the amino acid sequence as set forth in SEQ ID 
N0:2. 

6. A recombinant host cell produced by the process of 
claim 5. 

7. An isolated membrane of the recombinant host cell of 
claim 6 expressing a polypeptide comprising the amino acid 
sequence as set forth in SEQ ID NO: 2. 

8. A process for producing a polypeptide having the amino 
acid sequence as set forth in SEQ ID NO: 2 comprising 



^3,052 

24 

culturing the host cell of claim 6 under conditions suflBcient 
for the production of said polypeptide and recovering said 
polypeptide from the culture. 

9. An isolated polynucleotide which is fully complemen- 
5 tary to the nucleotide sequence encoding the amino acid 

sequence as set forth in SEQ ID N0:2. 

10. An isolated polynucleotide which is fully complemen- 
tary to the polynucleotide as set forth in SEQ ID N0:1. 

11. An isolated polynucleotide which is fully complemen- 
10 tary to nucleotides 79 to 1161 of the polynucleotide as set 

forth in SEQ ID N0:1. 

« * * * ♦ 



